IsoLasso is an algorithm to assemble transcripts and estimate their expression levels from RNA-Seq reads.
The latest version, IsoLasso v 2.6.1, can be downloaded here. (Last update: 11/17/2012). The full paper can be downloaded here.
Due to the wide popularity of the IsoLasso C++ version, The MATLAB version of IsoLasso is discontinued since v 2.5. However, you can still find the MATLAB scripts of IsoLasso in previous versions.
NOTE: if you have compiling problems related to CGAL or GSL library, go to Section 3.2.1 for alternative solutions.
IsoLasso right now runs on Linux system. It requires Matlab with optimization toolbox installed into your Linux system. Now Matlab environment is not required. To run IsoLasso in a more convenient way, it is suggested (but not required) that Python 2.7 or higher version is installed.
The source code mainly consists of two parts, Matlab code (in matlab folder) and C++ code (in src folder). The main algorithm is originally implemented in Matlab and are now ported to C++ thanks to Yingsheng (Daniel) Gao. Other preprocessing tools are written in C++. Another Python script, runlasso.py, is used as the main entry of the program.
To handle BAM files (which is the default format for many read mapping tools), you need to install SAMTools.
If you have the Matlab environment, no third-party libraries are required except standard C++. However, if you want IsoLasso to run without Matlab, the C++ codes rely on GSL(http://www.gnu.org/s/gsl/) and CGAL(http://www.cgal.org/) library. A GCC version >4.3 is needed to compile the codes.
For most Debian/Ubuntu systems, both libraries are provided as standard packages (libgsl-dev, libcgal-dev) and are easy to install.
If you don't have the root privilege to install both packages, or you encounter some "file/library not found" errors, you may need to download the source code, compile and install by yourself.
Note: We provide an alternative program, CEM, if you really don't want GSL or CGAL libraries. CEM shares much of the IsoLasso but uses the EM algorithm instead of the quadratic program to estimate isoform expressions. CEM does not useany library functions from GSL and CGAL.
The compiling requires the header files of both packages, and the linking and running rely on two library files: libgsl.so (GSL library) and libCGAL.so (CGAL library). You need to modify two environmental variables, CXXFLAGS and LD_LIBRARY_PATH.
CXXFLAGS is used for compiling and linking. After installation, locate the positions of the header files and the library files (should include libCGAL.so), and specify them using "-I" and "-L" options in CXXFLAGS, respectively. For example, if the header files are in /home/me/include, and the library files are in /home/me/lib, then run the following command in shell (or put it in your .bashrc file in your home folder):
export CXXFLAGS="$CXXFLAGS -I/home/me/include -L/home/me/lib"
For running the program, modify the LD_LIBRARY_PATH as follows:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/me/lib
The IsoLasso source code includes a copy of the CGAL library
(in isolassocpp/CGAL/lib). You don't need to download or compile the
CGAL code; just copy the files to some other places and set up CXXFLAGS
and LD_LIBRARY_PATH variables as above.
The src folder in the source code includes some programs written in C++. Simply run Makefile to compile them:
make
If Python 3.0 is installed in your system, you can use runlasso.py (in scripts/bin folder) to conveniently run IsoLasso program. You need to modify the runlasso.py to change some definitions of the paths, including the path of your MATLAB program, and the path of your source folder. See runlasso.py for more details.
For your convenience, you may add the location of the bin directory to your PATH variable. For example, if your folder of IsoLasso is in the path /home/me/isolasso, then add the following line in the .bashrc file of your home folder:
export PATH="/home/me/isolasso/bin:"$PATH
After compiling, you can run IsoLasso using runlasso.py on your read alignment file (.sam or .bam file). If you are provided with the original RNA-Seq reads, you need to first map them to reference genome using read mapping tools, like Tophat or SpliceMap.
After that, run "runlasso.py sam/bam" to run IsoLasso. For example, if your file is test.sam, type
runlasso.py{options} test.sam
to run IsoLasso. This command consists two parts, first, runlasso.py uses processsam program in the to pre-process sam/bam files to generate an instance file. Then, runlasso.py calls another program, isolasso to run this instance file and outputs assembled transcripts. You can also use
runlasso.py test.instance
to run IsoLasso directly on the test.instance file generated by processsam.
Run "runlasso.py", "processsam" and "isolasso" without providing any parameters to see their usages.
Note: If you want IsoLasso to only calculate the expression levels of given transcripts (provided in BED format), use the following command:
runlasso.py -x <BED> --forceref test.sam
Click here to see a detailed description of the instance file generated by processsam program.
Usage: runlasso.py {options} < in.bam | in.sam | - >
This is main entry for IsoLasso. It processes sam/bam file or .instance file, and outputs the assembled transcripts. This script will pass all options to processsam and isolasso program.
Usage: processsam {options} <in.sam|->
processsam generates the instance file required for IsoLasso.
Required input: A SAM format file containing the read mapping information, or command line ('-'). See NOTE for further information.
Options:
-n/--isoinfer | Generate IsoInfer input files (.readinfo, .bound and .generange). |
-g/--min-gap-length <int> | The minimum length of the gap between two reads to be considered as separate genes. Default 0. |
-c/--min-read-num <int> | The minimum number of clustered reads to output. Default 4. |
-k/--max-pe-span <int> | The maximum pair-end spanning. Paired-end reads whose spanning exceeds this number will be discarded. Default 700000. |
-x/--annotation <string> | Provide existing gene annotation file (in BED format). Adding this parameter will automatically incorporate existing gene annotation information into instance file. The bed file should be sorted according to the chromosome name and starting position of isoforms. This option is mutually exclusive to the -r/--range option. |
-r/--range <string> | Use the provided gene ranges specified by the file (in BED format). This option is mutually exclusive to the -x/--annotation option. |
-e/--segment-bound <string> | Provide the exon-intron boundary information specified by the filename. See NOTE for more information about the file format. |
-s/--max-num-instance | The maximum number of instances be written to the file. Default -1 (no limit) |
-u/--min-cvg-cut <0.0-1.0> | The fraction for coverage cutoff, should be between 0-1. A higher value will be more sensitive to coverage discrepancies in one gene. Default 0.05. |
-b/--single-only | Treat reads as single-end reads, even if they are paired-end reads. |
-j/--min-junc-count <int> | Minimum junction count. Only junctions with no less than this number of supporting reads are considered. Default 1. |
-a/--annotation | Output annoation files, including read coverage (.real.wig), read coverage considering junctions and paired-end read spans (.wig), instance range and boundary (.bound.bed), junctions (.bed) and junction summary (.junction.bed). |
-v/--no-coverage | Don't output coverage information to the instance file. |
-o/--prefix <string> | Specify the prefix of all generated files. The default value is the provided file name. |
NOTE:
samtools view accepted_hits.bam chr1 | processsam -a -o accepted_hits -
sort -k 3,3 -k 4,4n in.sam > in.sorted.sam
to sort in.sam into in.sorted.sam, or use the pipe:sort -k 3,3 -k 4,4n in.sam | processsam -a -o accepted_hits -
chr1 15796 15796 +
Parameters: | |
-p/--pairend <int,int> | Specify the paired-end read span and standard derivation. Default 200,20. You may use this Python script to estimate both values from a given SAM/BAM file. |
-c/--min-read-num <int> | The minimum number of clustered reads to output. Default 0. |
IO Options: | |
--minexp <float> | The minimum expression level threshold cutoff. Default 0.1. |
--verbose | Enable verbose output. |
-o/--prefix <string> | Specify the prefix of the output files. The default value is the instance file. |
--no-filter | Do not filter isoforms with 0 expression levels. If this option is on, the predicted expression levels of some isoforms will be 0. |
--id <string> | Only predict the instance with specified ID. |
Reference Options: | |
-d/--directref | Output gene annotation (the Refs field in the instance file) directly. All expression levels are assigned 1. |
--forceref | Calculate the expression levels of gene annotations (the Refs field in the instance file). Using this option will automatically turn on the '--no-filter' option. |
CEM Options | |
--useem | Use EM algorithm instead of LASSO algorithm (which is default) to estimate expression levels. |
--usebias | Use quasi-multinomial bias correction. |
--elim | Allow CEM to eliminate low probability isoforms during the iteration. |
--correctn | Correct the gene read counts according to the quasi-multinomial bias parameter.
Warning: this is an experimental option so use it at your own risk. Due to the sample uncertainty, the calculation of the bias parameter may skew the distribution of some of the highly expressed genes. |
--alpha <float> | Specify the parameter of the negative Dirichlet prior. Default 5. |
--min-frac <float > | The minimum fraction of isoforms to be reported. Default 0.01. This option is invalid if --no-filter option is set. |
2012.11.17 IsoLasso v 2.6.1
Updates:
2012.07.22 IsoLasso v 2.6.0
Updates:
2012.04.04 IsoLasso v 2.5.2
Updates:
2012.02.22 IsoLasso v 2.5
Updates:2012.02.08 IsoLasso v 2.4.1
Updates:
2011.12.20 IsoLasso v 2.4
Updates:
2011.12.03 IsoLasso v 2.3
Updates:
2011.10.21 IsoLasso v 2.2
Updates:
2011.09.26 IsoLasso v 2.1
Updates:
2011.7.9 IsoLasso v 2.0
Some important updates:
2011.1.13 IsoLasso v 1.0
by Wei Li