Here is a Java implementation of our algorithm MEC 1.03. The input file should be a simple white-space (or tab) delimited text file without row and column names. The output is the group number of each row. Please use java -jar mec.jar to get the usage as follows:
Usage: java -jar mec.jar [options] datafile Options: -a <alpha> alpha in alpha-entropy (default 1 for Shannon's entropy) -c <number> number of clusters (default 2) -k <kernel> Kernel of Parzen window 0 : Hypercube (default) 1 : Gaussian -w <width> Bandwidth of Parzen window (default is 1.5) The bandwidth of i-th deimension is set to w*sigma, where sigma is the estimated standard deviation of i-th dimension. -i <I> Run <I> times k-means for initialization (default 1) -v verbose mode -h print this help messageIn the options, the width of Parzen window is important. For the datasets of high dimensionality, a large width (say, 2.5 or 3) is recommended. However, a too large width is also not appropriate, which makes the algorithm to run slow and produce poor results. The optimal width depends on the datasets. For the parameter c, the number of clusters, the users should set a larger number than that they expect. The extra clusters could be used to detect outliers. For other parameters, the defult should be fine for most datasets.
Recently, gene expression data with many heterogeneous conditions appeared. The common clustering methods (including MEC) cannot be applied on these datasets because the assumption that genes co-express in all conditions is too restricted. Please see UBCLUST for our universal biclustering algroithm that simultaneously identify groups of genes and groups of conditions.
The following Java Applet is a demonstration of MEC algorithm compared with k-means, deterministic annealing (DA) clustering, and self-organizing maps (SOMs), in which we use several 2-dimensional synthetic data. The k-means is employed to give the initial partition of our method.
Please send comments and questions to Haifeng Li |
Total visits: |
|