Online Program Home
My Program

Abstract Details

Activity Number: 274 - Random Forests in Big Data, Machine Learning and Statistics
Type: Invited
Date/Time: Tuesday, July 31, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #326551
Title: Beyond the Bagg: Consistent Importance Intervals for Random Forest Predictors
Author(s): Lucas Mentch* and Giles Hooker
Companies: University of Pittsburgh and Cornell University
Keywords: Random Forests; Trees; CART; U-statistics; Variable Importance; Bagging

Despite the popularity of tree-based ensembles (bagging, boosting, random forests), these methods are often seen as prediction-only tools whereby the interpretability of traditional statistical models is sacrificed for predictive accuracy. We present recent work that suggests this black-box perspective need not always be the case. We consider a general resampling scheme based on subsampling and demonstrate that the resulting predictions can be seen as the equivalent of U-statistic estimators. As such, central limit theorems can be developed allowing for confidence intervals and hypothesis tests to be produced. Furthermore, the proposed test statistics can be seen as a natural and consistent measure of variable importance that, unlike the popular out-of-bag (OOB) measures, is robust to covariate correlation structures. We demonstrate results on ebird citizen science data and numerous other publically available datasets that suggest these alternative importance measures operate in a familiar fashion and can provide appreciable insights typically hidden via classic measures based on OOB error.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program