Abstract:
|
The growth in data volume and variety drives the need for principled data integration methods that can analyze multiple sources of data simultaneously. To facilitate and improve such integrative analyses, we develop a new statistical method named Integrated Principal Components Analysis (iPCA). iPCA is a generalization of PCA and serves as a practical tool to find and visualize common patterns that occur in multiple datasets. The key idea driving iPCA is the matrix-variate normal model, whose Kronecker covariance structure captures both individual patterns within each dataset and joint patterns shared by multiple datasets. Building upon this model, we develop several penalized covariance estimators for iPCA and study their theoretical properties. We show that our sparse iPCA estimator consistently estimates the underlying joint subspace, and using geodesic convexity, we prove that our non-sparse iPCA estimator converges to the global solution of a non-convex problem. We also demonstrate the effectiveness of iPCA through simulations and real data examples, including a case study application to integrative genomics for Alzheimer's Disease.
|