Online Program

Return to main conference page
Friday, February 21
Fri, Feb 21, 2:00 PM - 3:30 PM
Regency C
Big Data - Big Problems

WITHDRAWN: Ensemble Imputation for DNA Methylation Levels Across Platforms (303978)

*Gang Li, The University of North Carolina at Chapel Hill 

Keywords: Ensemble Learning, Imputation, Cross-validation, Epigenetics

DNA methylation at CpG dinucleotides is one of the most extensively studied epigenetic marks due to its control of gene expression and critical importance in normal development. With technological advancements, geneticists can profile DNA methylation at increasingly higher resolutions. However, different DNA methylation profiling platforms differ in their resolutions, hindering joint analysis. Here, we extend our penalized functional regression (PFR) to impute from HM450 to ~850K CpG sites on the HumanMethylationEPIC (HM850) BeadChip, by ensembling with 4 additional methods, K-nearest-neighbors, logistic regression, random forests and XGBoost, to further increase imputation accuracy. We analyzed data from two population cohorts measured both by HM450 and HM850. We assess three datasets: ELGAN and PTSD separately, and a combined dataset with batch effects corrected via the ComBat R function. The cross-validation results show that there is no uniformly best imputation method across the three datasets. Our data highlight that the ensemble method outperforms all the individual methods when imputing from HM450 to HM850, which will boost power for discovery in subsequent EWAS studies.