Online Program

Return to main conference page

All Times EDT

Friday, June 5
Machine Learning
Machine Learning 3
Fri, Jun 5, 3:30 PM - 5:05 PM
TBD
 

Machine Learning Model Selection with Complex Sample Survey Data (308369)

*Brian Kim, University of Maryland 

Keywords: cross-validation, temporal holdout, survey weights

The supervised machine learning process can involve using one of a wide variety of model selection methods, such as k-fold cross-validation, leave-one-out cross-validation, temporal holdout, and random subsampling. Many of these methods have been studied quite extensively under the typical machine learning assumption of an independent and identically distribution sample. However, there is a significant deficiency in the literature on applying these model selection and validation methods to complex sample survey data. Though there has been some work done on developing algorithms or models, such as logistic regression or decision trees, to account for survey weights, very little work has been on using survey weights with model evaluation and selection, and none that explore the effects of various evaluation methods with any depth. We use simulation studies based on characteristics of real surveys and real data to explore the effects of incorporating survey weights to model selection and validation.