Online Program

Saturday, February 20
PS3 Poster Session 3 & Continental Breakfast sponsored by Capital One Sat, Feb 20, 8:00 AM - 9:15 AM
Ballroom Foyer

Practical Limitations of the Test/Validation Analysis Strategy (303220)

*Christopher Holloman, Information Control Corporation 

Keywords: test, validation, time series, scientific foundation

Data analysts frequently divide data sets into training and validation components. The training component is used for data exploration and model building, while the validation component is used for assessing the performance of the selected model. This strategy is invoked as a way to avoid model overfitting and a way to quantify the expected performance of the selected model on new data sets. However, in practice it is unusual for a validation data set to be used a single time. If the validation data set indicates that the model does not accurately describe the data-generating mechanism, the analyst returns to the training data set to build a new model. Cycling between building models on the training data set and assessing models on the validation data set reduces the utility of the validation – the validation data set becomes an ad hoc component of model fitting. We first describe the scientific foundations for using test and validation data sets. Then, we explore the impact of performing multiple model-building/assessment cycles on the accuracy of the final model using both simulated data sets and a case study.