Abstract:
|
Principal component analysis (PCA) is commonly used statistical method in a wide range of applications. However, it does not work well when the number of features is larger than the sample size. Moreover, it is unclear how to properly handle incomplete data in PCA analysis. We consider the estimation of the sparse principal subspace in the high dimensional setting with missing data. We propose a two step estimation procedure, and establish the rates of convergence for estimating the principal subspace. Simulated examples show its competitive performance compared to existing sparse PCA methods. We also apply the method to single-cell data, which typically have many missing values, and show that the proposed method can better distinguish cell types than other PCA methods.
|