The format of the instance file
An example
Here, an example of the instance file is given. Click each section to see a detailed explanation.
Instance 1
Boundary
chr1 3205904 3671497 -
ReadLen 100
Segs 13
3205904 3207317 1414 52
9
0
7
0.0219236 3.29208
3207318 3213438 6121 18
3
0
0
0.772259 0.278386
3213439 3214481 1043 33
8
7
3
0.285714 2.47651
3214482 3215632 1151 54
11 3
2
0 4.40487
3215633 3216968 1336 49
9
2
2
0.0127246 3.26422
3216969 3421701 204733 106
16 0
0
0.986851 0.0486634
3421702 3421901 200
25 14
8
9
0 8.595
3421902 3648310 226409 46
4
0
0
0.984952 0.0191954
3648311 3650509 2199
2
2
0
0
0.933151 0.0909504
3650510 3658846 8337
0
0
0
0
1 0
3658847 3658904 58
0
0
0
0
1 0
3658905 3670551 11647 37
3
0
1
0.775565 0.303512
3670552 3671497 946
77 17
10 0
0.105708 6.82135
Refs 3
1 0 1 1 0 0 0 0 0 0 0 0 0
- uc007aet.1
0 0 0 1 1 0 1 0 0 0 0 0 1
- uc007aeu.1
0 0 0 0 0 0 0 0 1 0 1 0 0
- uc007aev.1
Reads 468
SGTypes 19
1 0 0 0 0 0 0 0 0 0 0 0 0 45 0
1 0 1 0 0 0 0 0 0 0 0 0 0 7 -1
0 1 0 0 0 0 0 0 0 0 0 0 0 18 0
0 0 1 0 0 0 0 0 0 0 0 0 0 23 0
0 0 1 1 0 0 0 0 0 0 0 0 0 3 0
0 0 0 1 0 0 0 0 0 0 0 0 0 49 0
0 0 0 1 1 0 0 0 0 0 0 0 0 2 0
0 0 0 0 1 0 0 0 0 0 0 0 0 45 0
0 0 0 0 1 0 1 0 0 0 0 0 0 2 -1
0 0 0 0 0 1 0 0 0 0 0 0 0 100 0
0 0 0 0 0 1 1 0 0 0 0 0 0 6 -1
0 0 0 0 0 0 1 0 0 0 0 0 0 8 0
0 0 0 0 0 0 1 0 0 0 0 0 1 9 -1
0 0 0 0 0 0 0 1 0 0 0 0 0 44 0
0 0 0 0 0 0 0 1 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1 0 0 0 0 2 0
0 0 0 0 0 0 0 0 0 0 0 1 0 36 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 67 0
PETypes 224 24
1
1 16
-56:1 -45:2 -43:1 -42:2 -31:2 -25:1 -24:1 -23:1 -15:1 -3:1 -1:1 1:1 5:2
16:1 54:1 79:1
1
2 5
-37:1 -33:1 9:1 20:1 34:1
2
4 2
-37:1 71:1
3
3 8
-94:1 -11:1 18:1 24:1 42:1 92:1 129:1 131:1
4
4 10
-65:1 -60:1 -55:1 -45:1 -29:1 -25:1 -13:1 -3:1 13:1 19:1
... skip 19*2 lines ...
Coverage
19 468
0 2
1,43 2,1
1 2
1,5 2,1
2 1
1,18
3 2
1,19 2,2
4 2
1,1 2,1
5 2
1,45 2,2
6 1
1,2
... skip 12*2 lines ...
Fields
Instance
Consists of only one line: the word "Instance" and the ID of this instance.
Boundary
Consists of only one line, including the word "Boundary", the
chromosome, start coordinate, end coordinate and orientation of this
instance. The coordinates are 1-base inclusive, which means the first base of the reference should be 1, and "[1 100]" indicates 100 bases from base 1 to base 100.
Note: if isolasso cannot determine the orientation, the orientation field will be ".".
ReadLen
One line, the read length of the read.
Segs
The definition of segments, and the basic statistics of each segment.
The first line is a word "Segs", followed by an integer n showing the number of segments in this instance.
Followed by the first line is n lines, one for each defined segment. For example:
3205904 3207317 1414 52
9
0
7
0.0219236 3.29208
Each line consists of 9 fields:
- Segment start (1-base)
- Segment end (1-base, inclusive)
- Length of the segment (equals to (segment end -segment start +1))
- Reads falling onto this segment
- Max read coverage
- The coverage on the leftmost base of the segment (i.e., segment start)
- The coverage on the rightmost base of the segment (i.e., segment end)
- The percentage of bases with 0 coverage
- Mean coverage
Refs
If a reference annotation is provided (-x option), this field records the structure of these annotations.
The first line consists of the word "Refs" and an integer n showing the
number of annotated transcripts (0 if no annotation is provided).
Followed by the first line is n lines, one for each annotated transcript. For example,
1 0 1 1 0 0 0 0 0 0 0 0 0
- uc007aet.1
Each line consists of 3 fields, separated by the tab delimitor:
- A binary indicator showing whether this annotation contains a segment defined by the "Segs" field. The separator is space.
- The orientation of this annotation.
- The ID of this annotation.
Reads
One line, the number of reads in this instance.
SGTypes
This section defines the different single-end read types.
The first line consists of the word "SGTypes" and an integer n showing
the number of defined types, followed by n lines defining each type.
For example,
1 0 1 0 0 0 0 0 0 0 0 0 0 7 -1
Each line consists of 2 fields, separated by the tab delimitor:
- (n+1) integers for n segments. The first n intergers are binary,
showing whether this read type contains a segment defined by the "Segs"
field (similar to the records in the "Refs" field). Space is used to
separate these (n+1) integers. The last number is the number of reads of this type.
- The orientation of this read type, if this read is a splicing junction read. 0 if the orientation cannot be determined.
PETypes
This section defines the paired-end read types, if paired-end RNA-Seq reads are provided.
The first line consists of three fields: the "PETypes" word, the number
of paired-end reads, and an integer n showing the number of paired-end
types. Following the first line is 2*n lines, every 2 lines showing a
paired-end type. For example,
1
1 16
-56:1 -45:2 -43:1 -42:2 -31:2 -25:1 -24:1 -23:1 -15:1 -3:1 -1:1 1:1 5:2
16:1 54:1 79:1
The first line consists of 3 numbers, the SGType of the first read of
the paired-end read type (beginning from 1), the SGType of the second
read of the paired-end read type, and the number of fields in the
second line.
The second line showing the distance between two pairs, and the number
of such paired-end reads. For example, "-56:1" means there is 1
paired-end read whose distance between its two pairs is -56bp.
Coverage
This section defines a more detailed coverage statistics for each SGType.
The first line consists of three fields: the "Coverage" word, the
number of records following this line (i.e., the number of SGTypes),
and the total number of reads used for calculating the coverage
(generally the same as the "Reads" field).
If there are n SGTypes, there will be 2*n lines following the first line. For example,
0 2
1,43 2,1
The first line consists of 2 numbers, the index of the SGTypes (beginning from 0), and the number of fields in the second line
The second line showing the value of the coverage, and the number of bases having this coverage.