JSM Preliminary Online Program
This is the preliminary program for the 2007 Joint Statistical Meetings in Salt Lake City, Utah.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.



Back to main JSM 2007 Program page




Activity Number: 251
Type: Contributed
Date/Time: Tuesday, July 31, 2007 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract - #308346
Title: Random Forests for Feature Selection: To Be Handled with Caution
Author(s): Carolin Strobl*+
Companies: LMU Munich
Address: Ludwigstr 33, Munich, 80539, Germany
Keywords: feature selection ; random forest ; bagging ; subagging ; bootstrap ; bias
Abstract:

Variable importance measures for random forests are receiving increasing attention as a screening tool in high dimensional classification tasks, e.g. in statistical genomics. However, we show and illustrate in simulation studies that suboptimal predictor variables may be artificially preferred by these measures when predictor variables vary in their scale level or their number of categories. The two statistical mechanisms underlying this deficiency are biased variable selection in the individual classification trees on one hand, and effects induced by bootstrap sampling with replacement on the other hand. An alternative implementation of random forests and bagging, providing unbiased variable selection, is presented. When this method is applied with subsampling without replacement, the resulting variable importance measure can be used reliably as a screening tool in any data situation.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2007 program

JSM 2007 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised September, 2007