Generalized Linear Discriminant Analysis
The conventional linear discriminant analysis (LDA) requires that the
within-class scatter matrix Sw be nonsingular. In many applications such
as cancer classification with gene expression profiling, face recognition,
web document classification, etc., however, Sw is always singular
due to the small sample size problem, i.e., the number of samples is less
than the dimensionality of data. To solve the problem, we propose the
generalized linear discriminant analysis (GLDA) that is a general, direct,
and complete solution to optimize the modified Fisher's criterion.
Different from the conventional LDA, GLDA does not assume the nonsingularity
of Sw, and thus solves the small sample size problem. This is achieved by
carefully investigating the properties of scatter matrices. GLDA is
mathematically well-founded and coincides with the conventional LDA when
Sw is nonsingular. To accommodate the very high dimensionality of datasets,
a fast algorithm for GLDA is developed. Extensive
experiments on cancer classification show that our method is superior
to the widely used classification methods such as support vector machines,
random forests, and k-nearest neighbor method, especially for the datasets
with many classes and very high dimensionality.
Here is an R implementation of GLDA.
The function glda.predict(train.x, train.y, test.x) can be used to
train the model on the training data <train.x,train.y> and predict on
the test data test.x, where train.x is a matrix of which each row is
a sample and train.y is a vector of corresponding labels. The matrix
test.x is the test data. The output is the vector of predicted labels
of test data.
Please send comments and questions to Haifeng Li