Abstract:
|
Integrative genomic analysis is a powerful tool that evaluates whether a disease is associated with genes in multiple genomic data types, such as DNA methylation, copy number variation and gene expression, to study the underlying biological mechanisms. It is common to conduct the analysis for each data type separately and combine the results ad hoc, leading to loss of statistical power and uncontrolled overall false discovery rate (FDR). We propose a multivariate mixture model framework (IMIX) that integrates multiple types of genomic data to examine and relax the commonly adopted conditional independence assumption. We investigate multi-class FDR control in IMIX, and show the gain in lower misclassification rates at controlled overall FDR compared with established individual data type analysis strategies, such as Benjamin-Hochberg FDR control, the q-value, and family-wise error rate control by extensive simulations. The proposed IMIX features statistically-principled model selection, FDR control and computational efficiency. Applications to the TCGA data provide novel multi-omic insights into the luminal/basal subtyping of bladder cancer and the prognosis of pancreatic cancer.
|