Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 75 - Invited EPoster Session II
Type: Invited
Date/Time: Sunday, August 7, 2022 : 9:35 PM to 10:30 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #320792
Title: Random Forests: Why They Work and Why That's a Problem
Author(s): Lucas Mentch* and Siyu Zhou
Companies: University of Pittsburgh and University of Pittsburgh
Keywords: Bagging; CART; Random Forest; Degrees of Freedom; Variable Importance; Double Descent
Abstract:

Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success, a full and satisfying explanation for their success has yet to be put forth. Here we show that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. From a model-complexity perspective, this means that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicit regularization procedures like the lasso. Realizing this, we demonstrate that alternative forms of randomness can provide similarly beneficial stabilization. In particular, we show that augmenting the feature space with additional features consisting of only random noise can substantially improve the predictive accuracy of the model. This surprising fact has been largely overlooked within the statistics community, but has crucial implications for thinking about how best to measure variable importance.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program