|
Activity Number:
|
144
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Monday, July 30, 2007 : 10:30 AM to 12:20 PM
|
|
Sponsor:
|
Biometrics Section
|
| Abstract - #309893 |
|
Title:
|
Variable Importance Selection: Random Forest vs. Logistic Regression
|
|
Author(s):
|
Andrejus Parfionovas*+ and Adele Cutler
|
|
Companies:
|
Utah State University and Utah State University
|
|
Address:
|
563 N 700 E apt 9, Logan, UT, 84321,
|
|
Keywords:
|
variable selection ; Random Forest ; Logistic Regression ; multicollinearity ; classification ; cardiac events
|
|
Abstract:
|
We demonstrate the efficiency of variable selection for multivariate analysis using Random Forests (RF) and compare it to Logistic Regression (LR) using numerical simulations. RF demonstrates higher success rate for choosing statistically important variables on highly correlated and/or noisy data. We discovered the essential difference between the variable selection mechanism of two approaches: the RF assigns importance based on its explanatory value of the variable, while LR focuses on a subset that provide sufficient explanatory effect, thus completely ignoring other variables of possible interest. RF provides more informative insight of the data, is more robust and stable. Using Kolmogorov-Smirnov test we propose a comprehensible and easy to use method to compare the variables importance illustrated on a real-life data example (cardiac events prediction).
|