Online Program Home
My Program

Abstract Details

Activity Number: 353 - SPEED: Statistical Learning and Data Science Speed Session 2, Part 2
Type: Contributed
Date/Time: Tuesday, July 30, 2019 : 10:30 AM to 11:15 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #307718
Title: Smoothing Random Forest
Author(s): Benjamin LeRoy* and Max G'Sell
Companies: Carnegie Mellon University and Carnegie Mellon University
Keywords: Regression; Classification; Smoothing; Out of Bag; OOB; Over fit

Since its introduction by Brieman (2001), random forests have proved to be high preforming, general performance algorithm used with little tuning. Thought analysis like Brieman (2001) and Biau (2012) provide analysis that suggests convergence of the estimator to the mean estimator under the random creation of trees - which suggests that the number of trees shouldn’t be seen as holding potential for over fitting - experiments (Segal (2004) ) and theory (Biau (2012) and Denil et al (2014)) suggest that Random forest algorithms’ use of fully grown trees provides potential for overfitting / higher asymptotic risk. We provide a way to smooth a random forest (for both the classification and regression setting), which can be framed as reducing the depth of the trees within the random forest. We provide a set of simulation studies that shows that this smoothing can be useful in low density regions where the change in the underlying function is less smooth (using examples from Criminisi et al (2012)), new examples showcasing the extent this over fitting exists in higher dimensions, and also on commonly used statistical machine learning datasets.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program