Abstract Details
Activity Number:
|
640
|
Type:
|
Topic Contributed
|
Date/Time:
|
Thursday, August 7, 2014 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract #312178
|
View Presentation
|
Title:
|
A Forest Measure of Variable Importance Resistant to Correlations
|
Author(s):
|
Padraic Neville*+ and Pei-Yi Tan
|
Companies:
|
SAS Institute and SAS Institute
|
Keywords:
|
Random Forests ;
SAS ;
Variable Importance ;
Decision Tree
|
Abstract:
|
Variable importance estimates that are output from decision trees and random forests are often used to reduce the dimension of data, especially in the presence of many variables, because decision trees can process many variables quickly. However, trees typically inflate the importance of correlated variables and even promote irrelevant correlated variables above predictive independent variables. Strobl et al. (2008) analyze the cause and propose a remedy. Unfortunately, the remedy is too complex to be practical for a large number of observations. This paper presents a simple method, called random branch assignments, which conforms to the analysis of Strobl et al. and yet can handle many observations. Although the method still incorrectly ranks the variables when the signal-tonoise ratio is less than 1, it is dramatically less sensitive to correlation effects than the measures of variable importance in the randomForest() function in R.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2014 program
|
2014 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Professional Development program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.