Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 292 - Inferential Thinking in a Machine Learning World
Type: Invited
Date/Time: Wednesday, August 11, 2021 : 3:30 PM to 5:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #316616
Title: Why Random Forests Work and Why That’s a Problem
Author(s): Siyu Zhou* and Lucas Mentch
Companies: University of Pittsburgh and University of Pittsburgh
Keywords: Random Forest; Bagging; Regularization; Variable Importance; Model Selection
Abstract:

Despite their well-established record, a full and satisfying explanation for the success of random forests has yet to be put forth. Here, we take a step in this direction. Comparing against bagging with non-randomized base learners, we demonstrate that random forests are implicitly regularized by the additional randomness injected into individual trees, making them highly advantageous in low signal-to-noise (SNR) settings. Furthermore, we show that this regularization property is not unique to tree-based ensembles and can be generalized to other supervised learning procedures. Motivated by this, we find that another surprising and counterintuitive means of regularizing ensembles can come from the inclusion of additional random noise features in the model. Importantly, this leads to substantial concerns about common notations of variable importance based on improved model accuracy, as even purely random noise can routinely register as statistically significant.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program