Abstract:
|
We examined how well demographics, substance use, mental and other health indicators from multiple years of National Surveys on Drug Use and Health (NSDUH) predict individual propensity for multiple visits (< 3 per year) to Emergency Departments (ED). We compared performance of stepwise regressions, LASSO, classification trees, hybrid models, and random forests. Area under the curve (AUC) on a test set was 0.79, which is good for a national survey. Models revealed consistency in selecting predictors over multiple independent datasets, but showed sensitivity to variable selection method. Variable selection based on AUC can miss important variables (indicating small but high risk subpopulations) that don't contribute to population-level prediction but are of a critical importance for personalized prediction. We examined the role of the sample size in model prediction and variable selection. While consistency in the choice of top predictor was not affected by an increase in the sample size above certain level, the identification of critical population subgroups benefits from larger samples. Sensitivity analysis showed low sensitivity to sampling weights
|