Online Program Home
  My Program

Abstract Details

Activity Number: 630 - Machine Learning Applications
Type: Contributed
Date/Time: Thursday, August 3, 2017 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #324818 View Presentation
Title: Comparison of Variable Selection Methods on a National Survey of Drug Use and Health
Author(s): Georgiy Bobashev* and Li-Tzy Wu
Companies: RTI International and Duke University
Keywords: variable selection ; Comparison of predictive methods ; National Survey ; Personalized treatment ; random forest ; LASSO
Abstract:

We examined how well demographics, substance use, mental and other health indicators from multiple years of National Surveys on Drug Use and Health (NSDUH) predict individual propensity for multiple visits (< 3 per year) to Emergency Departments (ED). We compared performance of stepwise regressions, LASSO, classification trees, hybrid models, and random forests. Area under the curve (AUC) on a test set was 0.79, which is good for a national survey. Models revealed consistency in selecting predictors over multiple independent datasets, but showed sensitivity to variable selection method. Variable selection based on AUC can miss important variables (indicating small but high risk subpopulations) that don't contribute to population-level prediction but are of a critical importance for personalized prediction. We examined the role of the sample size in model prediction and variable selection. While consistency in the choice of top predictor was not affected by an increase in the sample size above certain level, the identification of critical population subgroups benefits from larger samples. Sensitivity analysis showed low sensitivity to sampling weights


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association