JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 640
Type: Topic Contributed
Date/Time: Thursday, August 7, 2014 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #312178 View Presentation
Title: A Forest Measure of Variable Importance Resistant to Correlations
Author(s): Padraic Neville*+ and Pei-Yi Tan
Companies: SAS Institute and SAS Institute
Keywords: Random Forests ; SAS ; Variable Importance ; Decision Tree
Abstract:

Variable importance estimates that are output from decision trees and random forests are often used to reduce the dimension of data, especially in the presence of many variables, because decision trees can process many variables quickly. However, trees typically inflate the importance of correlated variables and even promote irrelevant correlated variables above predictive independent variables. Strobl et al. (2008) analyze the cause and propose a remedy. Unfortunately, the remedy is too complex to be practical for a large number of observations. This paper presents a simple method, called random branch assignments, which conforms to the analysis of Strobl et al. and yet can handle many observations. Although the method still incorrectly ranks the variables when the signal-tonoise ratio is less than 1, it is dramatically less sensitive to correlation effects than the measures of variable importance in the randomForest() function in R.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.