Abstract:
|
The growth in data volume and variety drives the need for principled data integration methods that can analyze multiple sources of data simultaneously. To facilitate and improve such integrative data analyses, we develop a new statistical method, Integrated Principal Components Analysis (iPCA). iPCA is a generalization of the classical Principal Components Analysis that uses a Kronecker covariance model based on the matrix-variate normal distribution to capture individual patterns within each dataset and joint patterns shared by multiple data sets. We develop our iPCA model by proposing several classes of estimators, characterizing their optimization theoretic properties by showing that some achieve global optimality for a non-convex problem, and studying their statistical consistency. Our approach provides a firm model-based and theoretical foundation for dimension reduction, pattern recognition, visualization, and exploratory analysis of integrated data. We demonstrate the effectiveness of iPCA in simulations and real data examples, including an application to integrative genomics for Alzheimer’s disease.
|