Here is a Java implementation of our algorithm MEC 1.03. The input file should be a simple whitespace (or tab) delimited text file without row and column names. The output is the group number of each row. Please use java jar mec.jar to get the usage as follows:
Usage: java jar mec.jar [options] datafile Options: a <alpha> alpha in alphaentropy (default 1 for Shannon's entropy) c <number> number of clusters (default 2) k <kernel> Kernel of Parzen window 0 : Hypercube (default) 1 : Gaussian w <width> Bandwidth of Parzen window (default is 1.5) The bandwidth of ith deimension is set to w*sigma, where sigma is the estimated standard deviation of ith dimension. i <I> Run <I> times kmeans for initialization (default 1) v verbose mode h print this help messageIn the options, the width of Parzen window is important. For the datasets of high dimensionality, a large width (say, 2.5 or 3) is recommended. However, a too large width is also not appropriate, which makes the algorithm to run slow and produce poor results. The optimal width depends on the datasets. For the parameter c, the number of clusters, the users should set a larger number than that they expect. The extra clusters could be used to detect outliers. For other parameters, the defult should be fine for most datasets.
Recently, gene expression data with many heterogeneous conditions appeared. The common clustering methods (including MEC) cannot be applied on these datasets because the assumption that genes coexpress in all conditions is too restricted. Please see UBCLUST for our universal biclustering algroithm that simultaneously identify groups of genes and groups of conditions.
The following Java Applet is a demonstration of MEC algorithm compared with kmeans, deterministic annealing (DA) clustering, and selforganizing maps (SOMs), in which we use several 2dimensional synthetic data. The kmeans is employed to give the initial partition of our method.
Please send comments and questions to Haifeng Li 
Total visits: 
