Conference Program

Return to main conference page

All Times ET

Thursday, June 9
Computational Statistics
Machine Learning
New Models, Methods, and Applications I
Thu, Jun 9, 3:45 PM - 5:15 PM

Variable Importance Confidence Intervals within Random Forest (310163)


*Heather Lynn Cook, University of Southern Indiana 
Daniel Keenan, University of Virginia 
Douglas Lake, University of Virginia 

Keywords: random forest, confidence intervals, bootstrapping, variable importance

Very few methods are available that show the variability of variable importance specifically within methods such as random forest. Confidence intervals are extensively used in statistics and may be understood even by introductory level individuals especially when shown graphically. For this proposed method, a random forest model may be created per usual, then using the variable importance from each tree in the forest, bootstrapping is implemented to create confidence intervals for each variable's importance. These confidence intervals may be compared to current methods by Ishwaran and Lu (2018) with examples shown in R using a dataset to understand the variables and the importance interpretations. For example, if confidence intervals overlap between two predictors, the predictor ranked higher by the mean variable importance may not necessarily be more important than the predictor its confidence interval overlaps with. Thus, these confidence intervals allow for additional interpretations and understanding of the predictors involved in the model which is a common goal for an analysis of a dataset.