Abstract:
|
We consider the problem of extracting shared and individual signals from multi-view data, that is a collection of high-dimensional data matrices representing different types of measurements on the same samples. While the existing methods for such decomposition explore single matching across matrix rows (corresponding to samples), we focus on the case of double matching (both by samples and the types of measurements in the columns). The motivating example is the gene expression data from the Cancer Genome Atlas project collected from both primary tumor and adjacent normal tissue of the same subjects. The two corresponding data matrices are thus matched both by subjects and by genes. We evaluate the performance of existing methods designed for single matched data, and demonstrate their inconsistency in rank determination due to not taking double matching into account. We propose a new approach for rank determination that avoids this problem, and allows to extract shared and individual signals across subjects and across genes simultaneously.
|