Online Program Home
My Program

Abstract Details

Activity Number: 331 - Statistical and Practical Issues for Reproducible Molecular Prediction in Biomedical Studies
Type: Topic Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
Sponsor: ENAR
Abstract #329695
Title: The Impact of Different Sources of Heterogeneity on Loss of Accuracy from Genomic Prediction Models
Author(s): Levi Waldron*
Companies: CUNY School of Public Health
Keywords: cross-study validation; predictive modeling

Cross-study validation (CSV) of prediction models is an alternative to traditional cross-validation (CV) in domains where multiple comparable datasets are available. Although many studies have noted potential sources of heterogeneity in genomic studies, to our knowledge none have systematically investigated their intertwined impacts on prediction accuracy across studies. We employ a hybrid parametric/non-parametric bootstrap method to realistically simulate publicly available compendia of microarray, RNA-seq, and whole metagenome shotgun (WMS) microbiome studies of health outcomes. We assessed CSV accuracy while manipulating the following types of heterogeneity and combinations of them: 1) prevalence of clinical and pathological covariates, 2) differences in predictor covariance as could arise from batch effects, and 3) differences in the ``true'' model predicting outcome. The most easily identifiable sources of study heterogeneity are consistently not the primary ones that undermine the ability to accurately replicate the accuracy of omics prediction models in new studies. Unidentified heterogeneity, such as could arise from unmeasured confounding, may be more important.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program