Abstract:
|
Omics data facilitates new insights into the etiologic mechanism of disease, but presents many analytic challenges. We present an approach for the integrated analysis of germline, omic and disease data. We use a latent variable to relate information from germline genetic data to a disease outcome. Within a measurement error framework, the omic data is viewed as a flawed measure of underlying latent clusters, categorized to simplify interpretation. We use an expectation-maximization (EM) algorithm to estimate the unobserved latent clusters and model parameters, including genetic effects on the latent cluster and the impact of the cluster on omic patterns and on the disease outcome. We incorporate penalized methods for variable selection in a high dimensional setting for both the genetic data and the omic data. Using simulations, we demonstrate the ability of our approach to accurately estimate underlying clusters and their corresponding genetic, omic and disease effects. We demonstrate variable selection to identify genetic and omic factors as both the means and correlational structures are varied. We discuss extensions to accommodate ascertainment and missing data.
|