All Times EDT
Keywords: cross-validation, temporal holdout, survey weights
The supervised machine learning process can involve using one of a wide variety of model selection methods, such as k-fold cross-validation, leave-one-out cross-validation, temporal holdout, and random subsampling. Many of these methods have been studied quite extensively under the typical machine learning assumption of an independent and identically distribution sample. However, there is a significant deficiency in the literature on applying these model selection and validation methods to complex sample survey data. Though there has been some work done on developing algorithms or models, such as logistic regression or decision trees, to account for survey weights, very little work has been on using survey weights with model evaluation and selection, and none that explore the effects of various evaluation methods with any depth. We use simulation studies based on characteristics of real surveys and real data to explore the effects of incorporating survey weights to model selection and validation.