Abstract:
|
With the development of high-throughput biomedical technologies, principal component analysis (PCA) in high-dimensional regime is of great interest. Existing methods for the estimation of population eigenvalues, eigenvectors, and PC scores are based on a spiked eigenvalue model in which population eigenvalues are one except for a few large eigenvalues. In real data, this assumption may not be satisfied due to the presence of local correlation among features. We propose a novel method to consistently estimate population eigenvalues without the spiked eigenvalue assumption. Our method combines two existing algorithms, one for estimating the large eigenvalues, the other for estimating the distribution of the remaining eigenvalues. Based on the consistent estimator of population eigenvalues, we construct estimators of the angle between sample and population eigenvectors, correlation coefficients between sample and population PC scores, and shrinkage factors of the predicted PC scores. We also provide theoretical justification of the proposed methods using random matrix theory. Extensive simulation studies and real data examples from genetics show the superior performance of our method.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.