Online Program

Return to main conference page
Friday, February 15
Fri, Feb 15, 5:15 PM - 6:30 PM
St. James Ballroom
Poster Session 2 and Refreshments

A Comparison of Random Forest Variable Selection Methods for Classification Modeling (303814)

View Presentation View Presentation

Eddie Ip, Wake Forest University School of Medicine 
Mike Miller, Wake Forest University School of Medicine 
*Jaime Lynn Speiser, Wake Forest University School of Medicine 
Janet Tooze, Wake Forest University School of Medicine 

Keywords: random forest, variable selection, prediction modeling

Random forest classification is a popular machine learning method for developing prediction models. Often in prediction modeling, a goal is to reduce the number of variables needed to obtain a prediction in order to reduce the burden of data collection and improve efficiency. Several variable selection methods exist for the setting of random forest classification; however, there is a paucity of literature to guide users as to which method may be preferable for different types of datasets. Using 311 classification datasets freely available online, we evaluate the prediction error rates, number of variables, and computation times for variable selection methods. A significant contribution of our study is the ability to assess different variable selection techniques in the setting of random forest classification.