Activity Number:
|
544
|
Type:
|
Contributed
|
Date/Time:
|
Wednesday, August 3, 2016 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #320472
|
|
Title:
|
Systematic Sampling Design with Application to Data Splitting
|
Author(s):
|
Redouane Betrouni* and James E. Gentle
|
Companies:
|
George Mason University and George Mason University
|
Keywords:
|
systematic sampling ;
data splitting ;
prediction
|
Abstract:
|
In this study we propose a new scheme that uses sampling designs such as stratified systematic sampling to optimally split data into training and testing subsets. This procedure will help machine learning algorithms avoid the classical mistake of overfitting. While it might be slightly more computationally expensive it makes up for this apparent weakness by having a better estimate of test error and improve prediction performance. We provide computational evidence to support the benefits of the new proposed sampling designs over the traditional approach of simple random splitting of the data into testing and training, we also present an example to show how simple random sampling to partition data can distort relationship between important covariates and variable of interest for the test dataset.
|
Authors who are presenting talks have a * after their name.