Abstract:
|
Several modern datasets take the form of bidimensionally linked matrices, in which multiple multiple matrices share either rows or columns. For example, multiple molecular omics platforms measured for multiple sample cohorts are increasingly common in biomedical studies. We propose a very flexible factorization of such bidimensionally linked data that allows for the simultaneous identification of covariate driven-effects and auxiliary structured variation. Our approach provides a decomposition of covariate effects and low-rank structure, each of which may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., sample cohorts). We use a structured nuclear norm penalty as an objective function, with penalty parameters chosen by random matrix theory. The objective gives the mode of the posterior distribution for an intuitive Bayesian model. We apply the method to pan-omics pan-cancer data from The Cancer Genome Atlas (TCGA), integrating data from several omics platforms and several cancer types.
|