Online Program Home
  My Program

Abstract Details

Activity Number: 50 - Machine Learning and Statistical Inference: Building Breiman's Bridge
Type: Invited
Date/Time: Sunday, July 30, 2017 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #321939 View Presentation
Title: Central Limit Theorems and Hypothesis Tests for Random Forests
Author(s): Lucas K Mentch* and Giles J Hooker
Companies: University of Pittsburgh and Cornell University
Keywords: Bootstrap ; Subsample ; U-statistic ; Random Forests ; Bagging ; Trees

Modern learning algorithms are often seen as prediction-only tools, meaning that the interpretability provided by traditional models is sacrificed for predictive accuracy. We argue that this black-box perspective need not be the case by developing formal statistical inference procedures for predictions generated by supervised learning ensembles. Ensemble methods based on bootstrapping like random forests usually improve the accuracy and stability of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we consider a general resampling framework in which predictions are averaged over trees built on subsamples and demonstrate that the resulting estimator belongs to an extended class of U-statistics. We develop a corresponding central limit theorem allowing for confidence intervals to accompany predictions, as well as formal hypothesis tests for feature significance and additivity. Moreover, the internal estimation method we suggest allows for these inference procedures to be carried out at no additional computational cost. Demonstrations are provided on the ebird citizen science data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association