Abstract:
|
Advances in molecular "omics" technologies have motivated new methods for integrating multiple sources of high-content biomedical data. However, most methods to integrate multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (e.g., multiple cohorts measured on multiple platforms), which are increasingly common in biomedical studies. We propose BIDIFAC (Bidimensional Integrative Factorization) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes the data into (i) globally shared, (ii) row-shared, (iii) column-shared, and (iv) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. We use a penalized objective function that extends the nuclear norm penalty, and determine penalties via random matrix theory. We apply our method to integrate mRNA and miRNA expression data across tumor samples and normal samples. R code is available at https://github.com/lockEF/bidifac.
|