ISP is written in C++ and can be run on Linux/Unix systems. ISP requires the following packages: GLPK and QuadProg.
The latest code (version 0.3, Oct. 15, 2014) can be downloaded here.
After downloading, the source code can be extracted using the following command: "tar xvzf isp.0.3.tar.gz" (for isp version 0.3).
The src folder in the source code includes the source codes. Simply run Makefile to compile them:
make
For your convenience, you may add the location of the bin directory to your PATH variable. For example, if your folder of CEM is in the path /home/me/cem, then add the following line in the .bashrc file of your home folder:
export PATH="/home/me/isp/bin:"$PATH
The compiling requires the header files of both packages, and the linking and running rely on two library files: libglpk.so (GLPK library) and libQuadProgpp.so (QuadProg library). Generally you don't need to worry about the library files, but if you encounter the "library not found" or "header not found" error in compilation or execution, you need to modify two environmental variables, CXXFLAGS and LD_LIBRARY_PATH.
CXXFLAGS is used for compiling and linking. After installation, locate the positions of the header files and the library files (should include libglpk.so and libQuadProgpp.so), and specify them using "-I" and "-L" options in CXXFLAGS, respectively. For example, if the header files are in /home/me/include, and the library files are in /home/me/lib, then run the following command in shell (or put it in your .bashrc file in your home folder):
export CXXFLAGS="$CXXFLAGS -I/home/me/include -L/home/me/lib"
For running the program, modify the LD_LIBRARY_PATH as follows:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/me/lib
In the test folder, a simple test case is provided. Simply run the runisp.sh to test the program.
If the demo is running succesfully, there will be one folder out coming out, storing all intermediate and final results. The final isoform prediction is located in the following files:
allpred.n.ins.pred | Isoform prediction file. For each line, one isoform is specified, indicating:
|
allpred.n.ins.pred.explv | The expression level of isoforms in differet samples. |
allpred.n.ins.pred.gtf | The structures of isoforms in gtf file. |
here, n is the number of samples used (n=3 in our demo).
The flow algorithm is only one small part of the workflow. For assembly from multiple samples, one needs to first integrate information from multiple samples individually. Generally, for a given set of samples, there are mainly 5 steps:
allpred.n.ins | The multiple instance file (generated by step 4), recording all the information necessary to perform ISP algorithm. |
allbound.n.bed | The merged gene boundary (by step 2). |
alljunc.n.txt | The merged junction read statistics (by step 2). |
merged.n.bed | The merged gene boundary (by step 2). |
pthreadpred_i_n.pred/pthreadpred_i_n.pred.explv | The predicted isoform and isoform expression levels by different threads (by step 5). |
sample_i.1st.bed/.bound.bed/.instance/.junc.bed/.real.wig/.wig | The first scan results of individual samples (step 1), including junction reads, gene boundary, IsoLasso instance, junction read summary, the read coverage (without introns), and the read coverage (with introns). |
sample_i.2nd.n.instance | The second scan results of individual samples (step 3), indicated as a IsoLasso-compatible instance file. |
here, n is the number of samples used (n=3 in our demo), and i is the sample index (i=0,1,2 in our demo).
runminst is the main portal of the ISP program.
Usage:
runminst {OPTIONS} <BAM file 1|COMMAND 1> <BAM file 2|COMMAND 2> ...
Options:
-h/--help | Pring help message. |
-r/--range [range] | Specify the range of the bam file. The ranges are specified as 'chrname:start-end'; if multiple ranges are specified, they must be separated by comma (,). |
-t/--tmp [tmp] | Specify the temporary dir. Default tmp/ |
-p/--pthread [int] | Specify the number of threads used. Default 1. |
--debug [int,...] | Specify the steps executed. The integer value must be between 1-6. Default 1,2,3,4,5,6. |
-c/--command | Instead of providing BAM file names, use STDOUT of commands executable under the current path. |
-s/--sam | The file format is SAM instead of BAM. If it is on, -r/--range option is not allowed. |
-L/--label <string,...> | Specify the labels of the input files, separated by comma. The number of labels must equal to the input files. Default "sample_n". |
predminst is the core algorithm of the ISP program (step 5).
Usage: predminst {OPTIONS} <instance file>
-h | Print help information |
-i [ID] | Predict only the instance with specified ID. |
-o [file] | The prefix of the output file. |
-p x,y | Only predict instances with ID pattern x,y. For example, if (x,y)=(3,4), the program will only predict instances with ID 3,7,11,15,... |
--min-frac [0.0-1.0] | Only predict isoforms with expression greater than this fraction of the most abundant isoform. Default 0.1. |
--rd-alpha [0.0-1.0] | The weights for read supporting variables. Default 0.1 |
--assemble-by-sample | Assemble the transcripts sample-by-sample. This will lead to higher sensitivity but lower precision. |
--no-correlation | Do not perform segment correlation. |
2014.10.15, version 0.3