Abstract:
|
The NIH recently launched several precision medicine initiatives, which use high-throughput omics technologies to characterize molecular abnormalities or signatures associated with common diseases. The resulting trans-omics datasets (DNA sequences, RNA expressions, methylation profiles, metabolomics profiles, etc.) provide unprecedented opportunities to understand disease pathobiology at the system level. Integrative analysis of multiple omics platforms poses enormous statistical challenges because of complex biological networks, high-dimensional data, and missing values. We will discuss such challenges in this talk, focusing on integrative association analysis with multiple types of incomplete omics data. Specifically, we consider a structural equation modelling framework to incorporate the biological relationships among different types of omics variables and to formulate their effects on disease phenotypes. We propose a semiparametric approach to efficiently estimate the model parameters under arbitrary patterns of missing data and detection limits. We construct association tests that are valid and efficient when unknown omics values are inferred from observed data.
|