Online Program Home
My Program

Abstract Details

Activity Number: 544
Type: Contributed
Date/Time: Wednesday, August 3, 2016 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #320472
Title: Systematic Sampling Design with Application to Data Splitting
Author(s): Redouane Betrouni* and James E. Gentle
Companies: George Mason University and George Mason University
Keywords: systematic sampling ; data splitting ; prediction

In this study we propose a new scheme that uses sampling designs such as stratified systematic sampling to optimally split data into training and testing subsets. This procedure will help machine learning algorithms avoid the classical mistake of overfitting. While it might be slightly more computationally expensive it makes up for this apparent weakness by having a better estimate of test error and improve prediction performance. We provide computational evidence to support the benefits of the new proposed sampling designs over the traditional approach of simple random splitting of the data into testing and training, we also present an example to show how simple random sampling to partition data can distort relationship between important covariates and variable of interest for the test dataset.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association