Online Program

Return to main conference page

All Times ET

Friday, June 4
Computational Statistics
Simulation-Based Statistics
Fri, Jun 4, 11:25 AM - 1:00 PM

K-Fold Cross-Validation for Complex Sample Surveys (309674)


Cole Guerin, Colby College 
Thomas McMahon, Colby College 
*Jerzy Wieczorek, Colby College 

Keywords: Survey sampling, Cross validation, Model selection

Although K-fold cross-validation (CV) is widely used for model evaluation and selection, there has been limited understanding of how to perform CV for non-iid data, including from sampling designs with unequal selection probabilities. We introduce CV methodology that is appropriate for design-based inference from complex survey sampling designs. For such data, we claim that we will tend to make better inferences when we choose the folds and compute the test errors in ways that account for the survey design features such as stratification and clustering. Our mathematical arguments are supported with simulations and our methods are illustrated on real survey data.