Online Program Home
My Program

Abstract Details

Activity Number: 81 - New Development in Epigenome-Wide Association Studies
Type: Contributed
Date/Time: Sunday, July 29, 2018 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #330115
Title: Data Adaptive Evaluation of Preprocessing Methods Using Ensemble Machine Learning
Author(s): Rachael Phillips*
Companies: Biostatistics, UC Berkeley
Keywords: machine learning; genomics; preprocessing; epigenetics; normalization; genetics
Abstract:

For many types of biological data generated by high-throughput technologies, there is no single gold-standard for converting the raw data into a form that can be analyzed for relationships of the relevant biomarkers to exposures and disease. For example, much of the variation in the raw data generated by Illumina HumanMethylationEPIC and 450K arrays is due to the experimental design (comprising two different assay methods, two different color channels, and batch effects) and potentially less so due to the biological factor(s) of interest. Accordingly, several preprocessing methods have been developed. But, it is unclear which combination should be retained in downstream analysis. To tackle this issue, we have developed a data adaptive methodology that incorporates ensemble machine learning to assess which preprocessing streams generate better signal-to-noise. We employ this method to select normalizations for EPIC and 450K arrays in a principled way. The results suggest 1) differences in the relative performance of the possible preprocessing choices and the ultimate quality of the data, and 2) that such machine learning approaches can be practically applied to complex omics data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program