Online Program Home
My Program

Abstract Details

Activity Number: 361
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #321090
Title: Methodological Strategies to Define a Generalizable Model for Machine Learning Ensemble Techniques
Author(s): Joel Correa da Rosa* and Lewis Tomalin and Mayte Suárez-Fariñas
Companies: Rockefeller University and Icahn School of Medicine at Mount Sinai and Icahn School of Medicine at Mount Sinai
Keywords: ensemble ; machine learning ; glm ; predictive models ; bagging ; genomics

Ensemble-based methods including bagging, stacking of predictors and random forests have been used for quite a while. These techniques are applied to improve predictive performance, stabilize feature selection and reduce variance of automated decision-making systems. Although we can find good references on how using different subsets of training data to achieve diversity in ensembles and robust estimates of predictors performance, definition of a final and generalizable model is often overlooked. All modeling behind ensemble are very computer-intensive and the formulation of a final model is crucial to spread and implement the predictor on large scale. To contribute in filling this gap, we present and compare several strategies to define what we call "final model" for classification and regression problems when using the Elastic Net for Generalized Linear Models (GLMnet). Theoretical and practical aspects of each strategy are discussed and two applications - one for regression and other for binary classification - in high-throughput genomic data are presented: a final predictor for response-to-treatment in psoriatic patients and another for an index of severity in the same dataset.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association