The format of the instance file

An example

Here, an example of the instance file is given. Click each section to see a detailed explanation.

Instance        1
Boundary        chr1    3205904 3671497 -
ReadLen 100
Segs    13
3205904 3207317 1414    52      9       0       7       0.0219236       3.29208
3207318 3213438 6121    18      3       0       0       0.772259        0.278386
3213439 3214481 1043    33      8       7       3       0.285714        2.47651
3214482 3215632 1151    54      11      3       2       0       4.40487
3215633 3216968 1336    49      9       2       2       0.0127246       3.26422
3216969 3421701 204733 106     16      0       0       0.986851        0.0486634
3421702 3421901 200     25      14      8       9       0       8.595
3421902 3648310 226409 46      4       0       0       0.984952        0.0191954
3648311 3650509 2199    2       2       0       0       0.933151        0.0909504
3650510 3658846 8337    0       0       0       0       1       0
3658847 3658904 58      0       0       0       0       1       0
3658905 3670551 11647   37      3       0       1       0.775565        0.303512
3670552 3671497 946     77      17      10      0       0.105708        6.82135
Refs    3
1 0 1 1 0 0 0 0 0 0 0 0 0       -       uc007aet.1
0 0 0 1 1 0 1 0 0 0 0 0 1       -       uc007aeu.1
0 0 0 0 0 0 0 0 1 0 1 0 0       -       uc007aev.1
Reads   468
SGTypes 19
1 0 0 0 0 0 0 0 0 0 0 0 0 45    0
1 0 1 0 0 0 0 0 0 0 0 0 0 7     -1
0 1 0 0 0 0 0 0 0 0 0 0 0 18    0
0 0 1 0 0 0 0 0 0 0 0 0 0 23    0
0 0 1 1 0 0 0 0 0 0 0 0 0 3     0
0 0 0 1 0 0 0 0 0 0 0 0 0 49    0
0 0 0 1 1 0 0 0 0 0 0 0 0 2     0
0 0 0 0 1 0 0 0 0 0 0 0 0 45    0
0 0 0 0 1 0 1 0 0 0 0 0 0 2     -1
0 0 0 0 0 1 0 0 0 0 0 0 0 100   0
0 0 0 0 0 1 1 0 0 0 0 0 0 6     -1
0 0 0 0 0 0 1 0 0 0 0 0 0 8     0
0 0 0 0 0 0 1 0 0 0 0 0 1 9     -1
0 0 0 0 0 0 0 1 0 0 0 0 0 44    0
0 0 0 0 0 0 0 1 0 0 0 0 0 1     1
0 0 0 0 0 0 0 0 1 0 0 0 0 2     0
0 0 0 0 0 0 0 0 0 0 0 1 0 36    0
0 0 0 0 0 0 0 0 0 0 0 1 1 1     0
0 0 0 0 0 0 0 0 0 0 0 0 1 67    0
PETypes 224     24
1       1       16
-56:1 -45:2 -43:1 -42:2 -31:2 -25:1 -24:1 -23:1 -15:1 -3:1 -1:1 1:1 5:2 16:1 54:1 79:1
1       2       5
-37:1 -33:1 9:1 20:1 34:1
2       4       2
-37:1 71:1
3       3       8
-94:1 -11:1 18:1 24:1 42:1 92:1 129:1 131:1
4       4       10
-65:1 -60:1 -55:1 -45:1 -29:1 -25:1 -13:1 -3:1 13:1 19:1
... skip 19*2 lines ...
Coverage        19      468
0       2
1,43    2,1
1       2
1,5     2,1
2       1
1,18
3       2
1,19    2,2
4       2
1,1     2,1
5       2
1,45    2,2
6       1
1,2
... skip 12*2 lines ...

Fields

Instance

Consists of only one line: the word "Instance" and the ID of this instance.

Boundary

Consists of only one line, including the word "Boundary", the chromosome, start coordinate, end coordinate and orientation of this instance. The coordinates are 1-base inclusive, which means the first base of the reference should be 1, and "[1 100]" indicates 100 bases from base 1 to base 100.

Note: if isolasso cannot determine the orientation, the orientation field will be ".".

ReadLen

One line, the read length of the read.

Segs

The definition of segments, and the basic statistics of each segment.

The first line is a word "Segs", followed by an integer n showing the number of segments in this instance.
Followed by the first line is n lines, one for each defined segment. For example:

3205904 3207317 1414 52 9 0 7 0.0219236 3.29208

Each line consists of 9 fields:

Segment start (1-base)
Segment end (1-base, inclusive)
Length of the segment (equals to (segment end -segment start +1))
Reads falling onto this segment
Max read coverage
The coverage on the leftmost base of the segment (i.e., segment start)
The coverage on the rightmost base of the segment (i.e., segment end)
The percentage of bases with 0 coverage
Mean coverage

Refs

If a reference annotation is provided (-x option), this field records the structure of these annotations.

The first line consists of the word "Refs" and an integer n showing the number of annotated transcripts (0 if no annotation is provided).
Followed by the first line is n lines, one for each annotated transcript. For example,

1 0 1 1 0 0 0 0 0 0 0 0 0 - uc007aet.1

Each line consists of 3 fields, separated by the tab delimitor:

A binary indicator showing whether this annotation contains a segment defined by the "Segs" field. The separator is space.
The orientation of this annotation.
The ID of this annotation.

Reads

One line, the number of reads in this instance.

SGTypes

This section defines the different single-end read types.

The first line consists of the word "SGTypes" and an integer n showing the number of defined types, followed by n lines defining each type. For example,

1 0 1 0 0 0 0 0 0 0 0 0 0 7 -1

Each line consists of 2 fields, separated by the tab delimitor:

(n+1) integers for n segments. The first n intergers are binary, showing whether this read type contains a segment defined by the "Segs" field (similar to the records in the "Refs" field). Space is used to separate these (n+1) integers. The last number is the number of reads of this type.
The orientation of this read type, if this read is a splicing junction read. 0 if the orientation cannot be determined.

PETypes

This section defines the paired-end read types, if paired-end RNA-Seq reads are provided.

The first line consists of three fields: the "PETypes" word, the number of paired-end reads, and an integer n showing the number of paired-end types. Following the first line is 2*n lines, every 2 lines showing a paired-end type. For example,

1 1 16
-56:1 -45:2 -43:1 -42:2 -31:2 -25:1 -24:1 -23:1 -15:1 -3:1 -1:1 1:1 5:2 16:1 54:1 79:1

The first line consists of 3 numbers, the SGType of the first read of the paired-end read type (beginning from 1), the SGType of the second read of the paired-end read type, and the number of fields in the second line.

The second line showing the distance between two pairs, and the number of such paired-end reads. For example, "-56:1" means there is 1 paired-end read whose distance between its two pairs is -56bp.

Coverage

This section defines a more detailed coverage statistics for each SGType.

The first line consists of three fields: the "Coverage" word, the number of records following this line (i.e., the number of SGTypes), and the total number of reads used for calculating the coverage (generally the same as the "Reads" field).
If there are n SGTypes, there will be 2*n lines following the first line. For example,

0 2
1,43 2,1

The first line consists of 2 numbers, the index of the SGTypes (beginning from 0), and the number of fields in the second line
The second line showing the value of the coverage, and the number of bases having this coverage.