BioJSON Format Documentation

Welcome to BioJSON

BioJSON Version 1.0 is a JavaScript Object Notation (JSON) specification for the representation and exchange of single multiple sequence alignments (MSAs) with their annotations, features, groups, hidden sequences, hidden columns and colour schemes. It also allows custom application settings to be encoded for the Jalview (www.jalview.org) multiple alignment workbench.

Future versions of BioJSON will provide more complete support for the interchange of multiple alignment data and associated metadata. It will also introduce new representations for phylogenetic data.

Although Version 1.0 is specific to multiple sequence alignments, we think BioJSON could be extended to allow mobile, web-based and standalone bioinformatics applications of many types to easily exchange data and three-dimensional structures.

The first version of the specification was developed in 2015 and implemented in Jalview 2.9.

Get involved

If you have any comments about the current BioJSON or want to contribute to the development of BioJSON then please file an issue.

Why use BioJSON

JSON is generally useful as a lightweight data exchange format and is now often used in preference to more verbose solutions such as XML. BioJSON has been built on JSON to extend the data exchange options possible with current multiple alignment file formats and to allow straightforward additions to the exchange medium that can accommodate annotations and other metadata as well as application-specific settings or parameters.

If you are presenting multiple alignment-based data on the web, then BioJSON V 1.0 provides a method to describe the alignment and its annotations. BioJSON V 1.0 also allows the straightforward import/export of these data to/from the Jalview desktop application

Future versions of BioJSON will provide more complete support for the interchange of multiple alignment data. It will also introduce new representations for phylogenetic data and 3D structures.

BioJSON v1 UML Diagram

The UML diagram below illustrates the overall structure of a BioJSON document. The minimal component of a BioJSON document is a single "Sequence" with all compulsory fields completed. Compulsory fields are marked with asterisks (*) in the UML model.

A Sequence may include SequenceFeatures that define sequence specific annotations. For example phosphorylation sites, ferredoxin sites, etc. The snippet below shows a BioJSON SequenceFeature representation for a phosphorylation site.

{
    "fillColor": "#ffffff",
    "score": 0,
    "sequenceRef": "1699378708",
    "featureGroup": "netphos",
    "description": "High confidence server. Only hits with scores over 0.8 are reported.",
    "links": [
        "PHOSPHORYLATION (S) 38_38|http://www.cbs.dtu.dk/cgi-bin/proview/webface-link?seqid=Q93XJ9&service=NetPhos-2.0"
    ],
    "xStart": 46,
    "xEnd": 47,
    "type": "PHOSPHORYLATION (S)"
}

An Alignment is made up of an array of Sequences that may optionally include permitted gap characters. Given an alignment of two or more sequences an AlignmentAnnotation describes an annotation that applies to the complete alignment. This might be a profile or text-based annotation that defines the "state" of each position in the alignment.

SequenceGroups can be defined for an alignment - these are rectangular regions bounded by startRes and endRes positions in the alignment coordinate system for a set of sequences. The snippet below is an example BioJSON object for a SequenceGroup.

{
    "displayText": true,
    "sequenceRefs": [
        "35272964",
        "1463982249",
        "352516542",
        "2017195779",
        "935935690",
        "1403030731",
        "1464266229",
        "1954867795",
        "14155760",
        "619576438",
        "1515791936",
        "71516988",
        "1807988066",
        "1758840270",
        "872325958"
    ],
    "startRes": 59,
    "groupName": "ferredoxin",
    "endRes": 124,
    "colourText": false,
    "showNonconserved": false,
    "colourScheme": "Zappo",
    "displayBoxes": true
}

AppSettings stores key=value pairs of custom application specific settings (i.e visualisation settings, etc) for different applications that consume or generate BioJSON.

Some UML notations and their meanings

1 → only one
0..1 → zero or one
0..* → zero or many
1..1 → one to one
1..* → one or many

The BioJSON Schema

BioJSON is specified using version 4 of the jsonschema standard. The schema is available via the github repository or can be downloaded from here. You can explore the BioJSON schema with the interactive visualisation created with Docson.

Instructions for interacting with the Docson schema

When the mouse is hovered over the elements in the BioJSON schema section, traversable elements become highlighted and clickable
Click an object’s “object” button to expand it, or the plus “+” link to expand all objects
Click the minus “-” to collapse all objects
The parenthesis “{}” link shows the underlying JSON schema source which provides a formal description of the BioJSON format
Expanding an object reveals its variables, their type and a brief description in the object, their data types and the description of each variable

Colour schemes

More details of the following colour schemes referenced in BioJSON can be explored by clicking on it.

The following colour schemes where referenced in BioJSON schema. Click on any of them to view more detail informations about the scheme.

Click any of the following colour scheme referenced in BioJSON to view a more detailed information about it.

% Identity	Purine /Pyrimidine
Blosum62	RNA Helices
Buried	Strand
Clustal	T-Coffee Scores
Helix	Taylor
Hydro	Turn
Nucleotide	Zappo

Sample BioJSON flat-file

Download a minimalistic BioJSON data or view it here.


{
    "seqs": [
        {
            "name": "FER_CAPAN/125-133",
            "type": "Protein",
            "start": 125,
            "svid": "1.0",
            "end": 133,
            "id": "1856498783",
            "seq": "VAYPQS-DV-E",
            "order": 1
        },
        {
            "name": "FER1_SOLLC/125-133",
            "type": "Protein",
            "start": 125,
            "svid": "1.0",
            "end": 133,
            "id": "1603641013",
            "seq": "VAYPKG-DV-E",
            "order": 2
        },
        {
            "name": "Q93XJ9_SOLTU/125-134",
            "type": "Protein",
            "start": 125,
            "svid": "1.0",
            "end": 134,
            "id": "477931709",
            "seq": "VAYPKCDVT-E",
            "order": 3
        },
        {
            "name": "FER1_PEA/130-138",
            "type": "Protein",
            "start": 130,
            "svid": "1.0",
            "end": 138,
            "id": "1011478371",
            "seq": "V-YPTSDVV-E",
            "order": 4
        },
        {
            "name": "Q7XA98_TRIPR/133-141",
            "type": "Protein",
            "start": 133,
            "svid": "1.0",
            "end": 141,
            "id": "1405185061",
            "seq": "V--PTSDVTIE",
            "order": 5
        }
    ],
    "appSettings": {
        "globalColorScheme": "foo",
        "webStartUrl": "http://www.jalview.org/services/launchApp",
        "application": "Jalview",
        "hiddenSeqs": "1603641013",
        "showSeqFeatures": "true",
        "version": "Test",
        "hiddenCols": "0-0;10-10"
    },
    "seqGroups": [
        {
            "displayText": true,
            "startRes": 2,
            "groupName": "JGroup:48822124",
            "endRes": 4,
            "colourText": false,
            "sequenceRefs": [
                "1856498783",
                "477931709"
            ],
            "svid": "1.0",
            "showNonconserved": false,
            "colourScheme": "Zappo",
            "displayBoxes": true
        }
    ],
    "alignAnnotation": [
        {
            "svid": "1.0",
            "annotations": [
                {
                    "displayCharacter": "",
                    "value": 0,
                    "secondaryStructure": "\u0000"
                },
                {
                    "displayCharacter": "",
                    "value": 0,
                    "secondaryStructure": "\u0000"
                },
                {
                    "displayCharacter": "",
                    "value": 0,
                    "secondaryStructure": "\u0000"
                },
                {
                    "description": "",
                    "displayCharacter": "β",
                    "value": 0,
                    "secondaryStructure": "E"
                },
                {
                    "description": "",
                    "displayCharacter": "β",
                    "value": 0,
                    "secondaryStructure": "E"
                },
                {
                    "displayCharacter": "",
                    "value": 0,
                    "secondaryStructure": "\u0000"
                },
                {
                    "description": "",
                    "displayCharacter": "α",
                    "value": 0,
                    "secondaryStructure": "H"
                },
                {
                    "description": "",
                    "displayCharacter": "α",
                    "value": 0,
                    "secondaryStructure": "H"
                },
                {
                    "displayCharacter": "",
                    "value": 0,
                    "secondaryStructure": "\u0000"
                },
                {
                    "displayCharacter": "",
                    "value": 0,
                    "secondaryStructure": "\u0000"
                },
                {
                    "displayCharacter": "",
                    "value": 0,
                    "secondaryStructure": "\u0000"
                }
            ],
            "description": "Secondary Structure",
            "label": "Secondary Structure"
        }
    ],
    "svid": "1.0",
    "seqFeatures": [
        {
            "fillColor": "#8db520",
            "score": 0,
            "sequenceRef": "477931709",
            "featureGroup": "Jalview",
            "svid": "1.0",
            "description": "1a45",
            "xStart": 5,
            "xEnd": 9,
            "type": "RESNUM"
        },
        {
            "fillColor": "#8db520",
            "score": 0,
            "sequenceRef": "1011478371",
            "featureGroup": "Jalview",
            "svid": "1.0",
            "description": "1a45",
            "xStart": 5,
            "xEnd": 9,
            "type": "RESNUM"
        },
        {
            "fillColor": "#8db520",
            "score": 0,
            "sequenceRef": "1405185061",
            "featureGroup": "Jalview",
            "svid": "1.0",
            "description": "1a45",
            "xStart": 5,
            "xEnd": 9,
            "type": "RESNUM"
        }
    ]
}

Sample HTML embedded with BioJSON

The image below is a screenshot of a Jalview multiple sequence alignment rendered with SVG and exported as a HTML file embedded with BioJSON.

Click here or the screenshot to view the live example. You can import the embedded BioJSON to Jalview by clicking the “Launch in Jalview” button on the example page. You can also view the embedded BioJSON for the page by clicking the “View raw BioJSON Data” button.

See BioJSON in Action!

The current application of BioJSON in Jalview 2.9 includes:

Export and import of BioJSON data as JSON flat files
Export and import of supporting data for interactive web pages, including:
- Interactive SVG figures (File → Export Image → HTML)
- The BioJS MSA viewer (File → Export Image → BioJS)

Future Applications

Some of the future applications of BioJSON include:

Underlying data format for Jalview web-services
Representation of linked cDNA and Protein alignments views
Data exchange format for Javascript version of Jalview (JalviewJS)