Abstract:
|
Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. There is an implicit computational cost to using ensemble methods, since it requires the training of multiple base learning algorithms. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. We will present several practical solutions to reducing the computational burden of ensemble learning while retaining superior model performance. Both projects have an R interface which provides easy access to scalable ensemble learning.
H2O Ensemble is an implementation of the Super Learner (i.e., stacking) ensemble algorithm which uses distributed base learning algorithms (including Random Forest and Deep Neural Nets) from the open source machine learning platform, H2O. Subsemble is a general subset ensemble prediction method, which partitions the training data into subsets, fits base learning algorithms on each subset, and uses a unique form of V-fold cross-validation to output a prediction function that combines the subset-specific fits.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.