Online Program Home
  My Program

Abstract Details

Activity Number: 425 - SPEED: Reliable Statistical Learning and Data Science
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 3:05 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #325293
Title: A Simulation Study to Evaluate Variable Importance Measures in Random and Conditional Inference Forests with Imputation for Data with Missing and Correlated Predictors
Author(s): Hung-Wen Yeh* and Rayus Kuplicki and Trang Le and Martin P. Paulus
Companies: and Laureate Institute for Brain Research and University of Tulsa and Laureate Institute for Brain Research
Keywords: random forest ; conditional inference forest ; correlated predictors ; missing values ; imputation ; variable importance measures
Abstract:

Random forest (RF) is a powerful tool for statistical learning. With the aid of variable importance measures (VIMs), RF can rank the importance of predictors, which can be used for feature selection. However, recent research has demonstrated that VIMs are biased when predictors are correlated and when data contain missing values. Imputation is a well-established method for handling missing values. Nonetheless, it's still not clear how VIMs perform with imputation under different missing mechanisms. A simulation study was conducted to explore the issue: response and correlated predictors were simulated to contain missing values, which were imputed by multivariate imputation by chained equations, and then analyzed by RF and conditional inference forest (CIF); three VIMs were compared: selection frequency (SF), unconditional permutation importance (UPI), and conditional permutation importance (CPI; CIF only). Results suggest SF and CPI are more robust than the conventional UPI for data with missing values and/or correlated predictors. We recommend imputation for missing values before applying CIF and using SF or CPI to reduce the bias due to correlated predictors.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association