MetaPhyl is a supervised classification method for metagenomic samples that takes advantage of the natural structure of microbial community data encoded by phylogenetic trees.
The C++ source code is available here.
The script to generate syntetic datasets for MetaPhyl is here.
Input data files and formats
OTU count data
Data file that has OTU count information for each sample.
1 x1_1 x1_2 ... x1_n
2 x2_1 x2_2 ... x2_n
N xN_1 xN_2 ... xN_n
Here N is the number of samples,
n - number of OTUs,
xi_j is the number of reads in the i-th sample that belong to the j-th OTU.
Data file that contains class labels for each sample.
s1_1 s1_2 ...
s2_1 s2_2 ...
sK_1 sK_2 ...
Here K is the number of classes,
sk_1 sk_2 ... is a list of samples that belong to the k-th class
Data file that contains the phylogenetic tree for the n OTUs in a Newick format.
OTUs must be numbered from 0 to n-1.
Command Line Options
||Print help message
MetaPhyl can be run in two modes: training and testing (or classification of
-d Data file that has OTU count information for each
sample (training mode).
-l Class labels file (training mode).
|| Data file that has OTU count information for each sample (training mode).
|| Class labels file (training mode).
||Phylogenetic tree file in the Newick format (training mode).
||Output file (training and testing modes).
||Input file for the testing mode produced during training phase.
||Weight parameter (training mode), value from 0 to 1.
||Regularization parameter (training mode).
To train the model:
./MetaPhyl -train -d example/samples.txt -t example/tree.tre -l example/labels.txt -w 0.5 -lambda 1 -o example/out.txt
To classify new samples:
./MetaPhyl -test -d example/samples.txt -c example/out.txt -o example/result.txt