All Times ET

Friday, June 4

Simulation-Based Statistics

Fri, Jun 4, 11:25 AM - 1:00 PM
TBD

K-Fold Cross-Validation for Complex Sample Surveys (309674)

Presentation

Cole Guerin, Colby College
Thomas McMahon, Colby College
*Jerzy Wieczorek, Colby College

Keywords: Survey sampling, Cross validation, Model selection

Although K-fold cross-validation (CV) is widely used for model evaluation and selection, there has been limited understanding of how to perform CV for non-iid data, including from sampling designs with unequal selection probabilities. We introduce CV methodology that is appropriate for design-based inference from complex survey sampling designs. For such data, we claim that we will tend to make better inferences when we choose the folds and compute the test errors in ways that account for the survey design features such as stratification and clustering. Our mathematical arguments are supported with simulations and our methods are illustrated on real survey data.

Online Program

K-Fold Cross-Validation for Complex Sample Surveys (309674)

American Statistical Association