Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 499 - New Methods for Machine Learning
Type: Contributed
Date/Time: Thursday, August 6, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #312956
Title: Augmented Bagging as an Alternative to Random Forests
Author(s): Siyu Zhou* and Lucas Mentch
Companies: University of Pittsburgh and University of Pittsburgh
Keywords: Regularization; Bagging; Random Forests; Variable Importance; Model Selection

Random forests have remained among the most popular off-the-shelf machine learning methods since their inception in 2001. Recent work provides strong evidence that the randomization in random forests serves as a form of implicit regularization, making them ideal models in low signal-to-noise ratio settings. Our work here provides another mean of regularization, namely, the inclusion of additional noise covariates in the model. Improvement from this sort of “augmented” bagging procedure can sometimes be even greater than traditional random forests. More importantly, this has crucial implications for metrics designed to measure variable importance, many of which compare model performance with vs without a set of features included. Our work implies that model improvements can exist in some procedures even when the features of interest are completely independent of the remaining data. Thus, we advocate comparing model performance with the original features against those with where feature subsets are replaced by random substitutes.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program