Abstract:
|
The modern age of data has allowed the characterization of numerous cancers on multiple omics platforms, which has spurred interest in designing new integrative methods of analysis. Principal component analysis (PCA) is a popular tool for dimension reduction, however the challenge of rank selection is further complicated by considering multiple datasets. We introduce a PCA-based approach for studying observational groups of data, which relies on an ANOVA-like decomposition that exploits the relationship between rank, noise level, and commonality via basic geometric properties. Notably, the resulting framework gives a novel rank selection procedure that is robust for heterogeneous data groups as well as large dimensions. We demonstrate our method in simulations and a data application for discovering biomarkers using gene expression data from five different cancers.
|