A General Framework for Biclustering Gene Expression Data


The common clustering methods cannot be applied on the gene expression data with many heterogeneous conditions because the assumption that genes co-express in all conditions is too restricted. In "A General Framework for Biclustering Gene Expression Data" (Submitted for publication), we proposed a novel biclustering method to simultaneously identify groups of genes and groups of conditions based on a universal merit, which in principle can detect any types of biclusters. The experiments show that our approach is very versatile and promising.

Here is a Java implementation of our new method UBCLUST 1.20 beta. The input file should be a simple white-space (or tab) delimited text file without row and column names. Please use java -jar ubc.jar to get the usage as follows:

Usage: java -jar ubc.jar [options] datafile
Options:
  -l <level>       discretization levels (default 128)
  -t <temperature> initial temperature (default 0.00001)
  -f <factor>      temperature factor (default 0.9)
  -e <estimator>   Kolmogorov complexity estimator
                     0 : Uniform Model (default)
                     1 : Constant Rows Model
                     2 : Additive Model
                     3 : Relaxed OPSM
  -k <runs>        run how many times the MCMC algorithm (default 1)
  -r               trace the MCMC algorithm
  -h               print this help message
In the options, the initial temperature has an important influence in the annealing procedure. A large initial temperature results in a long time annealing but the algorithm more likely returns a global optimal solution. The algorithm returns only one bicluster each time. To obtain multiple biclusters, the users should run the program several times or use the parameter -k. To get different types of biclusters, the users should use different Kolmogorov complexity estimators, which is controlled by the parameter -e. The output files row.txt and col.txt contain the row and column indices of found bicluster(s). The number 1 indicates that the corresponding row or column is in the bicluster.

Please send comments and questions to Haifeng Li

Total visits: Web Counter