|Saturday, February 20
|PS3 Poster Session 3 & Continental Breakfast sponsored by Capital One
Sat, Feb 20, 8:00 AM - 9:15 AM
Practical Limitations of the Test/Validation Analysis Strategy (303220)*Christopher Holloman, Information Control Corporation
Keywords: test, validation, time series, scientific foundation
Data analysts frequently divide data sets into training and validation components. The training component is used for data exploration and model building, while the validation component is used for assessing the performance of the selected model. This strategy is invoked as a way to avoid model overfitting and a way to quantify the expected performance of the selected model on new data sets. However, in practice it is unusual for a validation data set to be used a single time. If the validation data set indicates that the model does not accurately describe the data-generating mechanism, the analyst returns to the training data set to build a new model. Cycling between building models on the training data set and assessing models on the validation data set reduces the utility of the validation – the validation data set becomes an ad hoc component of model fitting. We first describe the scientific foundations for using test and validation data sets. Then, we explore the impact of performing multiple model-building/assessment cycles on the accuracy of the final model using both simulated data sets and a case study.