Abstract:
|
The advances in technology have lead to the collection of high dimensional data such as genomic data. Before applying the existing statistical methods on high dimensional data, principal component analysis (PCA) is often used to reduce dimensionality. Sparse PC loadings are usually desired in this situation for simplicity and better interpretation. Although PCA has been extended to produce sparse PC loadings, few methods take potential biological information into consideration. In this article, we propose two novel structured sparse PCA methods which not only have sparse solutions but also incorporate available biological information. Our simulation study demonstrates incorporating known biological information improves the performance of sparse PCA methods, and the proposed methods are robust to potential misspecification of the biological information. We further illustrate the performance of our methods in a Glioblastoma genomic data set.
|