Abstract:
|
Recent advancements in high-throughput, biomedical technologies have enabled the measurement of multiple high-dimensional omics data types in a single study, including genomics, epigenomics, transcriptomics and metabolomics. Each of these data types provides a different snapshot of the underlying biological system, and combining multiple data types has been shown to be very valuable in investigating important diseases. Individual components in these data are functionally structured in networks or pathways and incorporation of such structural information can improve analysis and lead to biologically more meaningful results. Canonical correlation analysis (CCA) is a classical method for extracting linear components that capture correlations between two multivariate random variables or data sets. We develop a Bayesian model which extends the classical CCA to more than two data sets that can describe relationships between groups of variables. The learnt components enable a variety of downstream analysis, including identification of sample subgroups, data imputation and the detection of outlier samples.
|