Friday, May 18

Survey Science

Fri, May 18, 10:30 AM - 12:00 PM
Lake Fairfax B

Systematic Sampling Design with Application to Data Splitting (304493)

*Redouane Betrouni, George Mason University
Edward Wegman, George Mason University

Keywords: systematic sampling ; data splitting ; prediction

In this study we propose a new scheme that uses sampling designs such as stratified systematic sampling to optimally split data into training and testing subsets. This procedure will help machine learning algorithms avoid the classical mistake of overfitting. While it might be slightly more computationally expensive it makes up for this apparent weakness by having a better estimate of test error and improve prediction performance. We provide computational evidence to support the benefits of the new proposed sampling designs over the traditional approach of simple random splitting of the data into testing and training, we also present an example to show how simple random sampling to partition data can distort relationship between important covariates and variable of interest for the test dataset.

Online Program

Systematic Sampling Design with Application to Data Splitting (304493)

ASA Meetings Department