Friday, February 15

Fri, Feb 15, 5:15 PM - 6:30 PM
St. James Ballroom

Poster Session 2 and Refreshments

A Comparison of Random Forest Variable Selection Methods for Classification Modeling (303814)

Eddie Ip, Wake Forest University School of Medicine
Mike Miller, Wake Forest University School of Medicine
*Jaime Lynn Speiser, Wake Forest University School of Medicine
Janet Tooze, Wake Forest University School of Medicine

Keywords: random forest, variable selection, prediction modeling

Random forest classification is a popular machine learning method for developing prediction models. Often in prediction modeling, a goal is to reduce the number of variables needed to obtain a prediction in order to reduce the burden of data collection and improve efficiency. Several variable selection methods exist for the setting of random forest classification; however, there is a paucity of literature to guide users as to which method may be preferable for different types of datasets. Using 311 classification datasets freely available online, we evaluate the prediction error rates, number of variables, and computation times for variable selection methods. A significant contribution of our study is the ability to assess different variable selection techniques in the setting of random forest classification.

Online Program

A Comparison of Random Forest Variable Selection Methods for Classification Modeling (303814)

American Statistical Association