Abstract:
|
Ensemble-based methods including bagging, stacking of predictors and random forests have been used for quite a while. These techniques are applied to improve predictive performance, stabilize feature selection and reduce variance of automated decision-making systems. Although we can find good references on how using different subsets of training data to achieve diversity in ensembles and robust estimates of predictors performance, definition of a final and generalizable model is often overlooked. All modeling behind ensemble are very computer-intensive and the formulation of a final model is crucial to spread and implement the predictor on large scale. To contribute in filling this gap, we present and compare several strategies to define what we call "final model" for classification and regression problems when using the Elastic Net for Generalized Linear Models (GLMnet). Theoretical and practical aspects of each strategy are discussed and two applications - one for regression and other for binary classification - in high-throughput genomic data are presented: a final predictor for response-to-treatment in psoriatic patients and another for an index of severity in the same dataset.
|