Online Program Home
My Program

Abstract Details

Activity Number: 623 - Statistical Modeling: Benefits and Drawbacks
Type: Contributed
Date/Time: Thursday, August 1, 2019 : 8:30 AM to 10:20 AM
Sponsor: Survey Research Methods Section
Abstract #307337 Presentation
Title: Statistical Learning for Complex Survey Data: Using Cross-Validation for Model Selection in Generalized Linear Models
Author(s): Darryl Creel*

Identifying the “best” set of independent variables is a common challenge when building statistical models. To identify the variables to be included in their models, many analysts use automated variable selection methods, e.g., forward elimination, backward deletion, stepwise selection, and single variable screening. In simple random samples, these selection methods have been criticized for resulting in an upward bias in the regression coefficients and a downward bias in the standard errors, which may result in the selection of a suboptimal set of independent variables. Even with these criticisms, these same selection methods are being applied to complex survey data. To avoid these criticisms, our approach to find the “best” model for complex survey data is to use k-fold cross-validation to directly estimate the test error. We estimate the test error for all possible subsets of independent variables and consider the subsets with smallest test errors as our possible model. In this paper, we will demonstrate an application of this approach for regression and logistic regression models using complex survey data and discuss the additional challenges related to the complex survey design

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program