JSM 2017 Online Program

Activity Number:	131 - Predictive Modeling in Data Science
Type:	Contributed
Date/Time:	Monday, July 31, 2017 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #324521	View Presentation
Title:	Variable Selection Using Intersection and Average of Random Forests
Author(s):	Faraz Niyaghi* and Sharmodeep Bhattacharyya and Sarah C Emerson
Companies:	Oregon State University and Oregon State University and Oregon State University
Keywords:	Variable Selection ; Random Forest ; Intersection of Forests ; Stability
Abstract:	Random forest (RF) has demonstrated the ability to select important variables and model complex data. However, due to the random sampling of data points and variables within RF algorithm, rankings of the selected variables can alter among fitted models to the same data set. This can result in selecting a noise variable over a main variable. This research investigates intersection and average methods to stabilize RF's variable selection. First, multiple RF models are fitted to the data, and ranking of variables and their relative importance are evaluated for each model. Average method ranks the variables based on their mean relative importance. Intersection method iteratively selects variables that are in common among top-ranked variables of these models. These methods also showed potential in detecting main effects in interaction terms.

Authors who are presenting talks have a * after their name.