Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 60 - Invited E-Poster Session II
Type: Invited
Date/Time: Sunday, August 8, 2021 : 6:45 PM to 7:30 PM
Sponsor: Section on Statistics and Data Science Education
Abstract #317376
Title: Methods to Reducing Bias in the Gini Variable Importance Measure of Categorical Variables for Random Forest Models
Author(s): Elisha Daniel Johnston*
Companies: El Camino College
Keywords: Random Forest; Gini Variable Importance Measure; Machine Learning; Monte Carlo Simulation
Abstract:

Random forest (RF) classification algorithms obscure the relative importance of variables when they randomly select observations and predictors while building and aggregating classification trees (CTs). Many use the Gini variable importance measure (GVIM) to assess variables’ relative contribution to the final prediction. GVIM favors qualitative variables with more categories. This research develops a regression-based bias-corrected GVIM (RBG) that regresses GVIM under the null (no association) on the number of categories. To investigate performance, I conducted a Monte Carlo study that varies (1) the number of categories within qualitative predictors and (2) the level of association of the predictors with the outcome. RBG obtains the corrected GVIM by subtracting the regression-predicted GVIM from the raw GVIM. The Monte Carlo simulation results indicate that when the predictors are strongly correlated with the outcome, RBG provides a more accurate correction, implying a reduction in bias. Therefore, RBG holds promise to improve the assessment of variable importance in some settings.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program