Online Program Home
My Program

Abstract Details

Activity Number: 75 - Probability and Statistics
Type: Contributed
Date/Time: Sunday, July 28, 2019 : 4:00 PM to 5:50 PM
Sponsor: IMS
Abstract #305216 Presentation
Title: Cross-Validation Nonparametric Bootstrap Study of the Linhart-Volkers-Zucchini Out-Of-Sample Prediction Error Formula for Logistic Regression Modeling
Author(s): Richard Golden* and Shaurabh Nandy and Vishal Patel
Companies: University of Texas At Dallas and Foxbat Research and Foxbat Research
Keywords: Akaike Information Criterion; Takeuchi Information Criterion; Cross-Validation; M-Estimation; Out-of-Sample Prediction Error; Nonparametric Bootstrap

Cross-validation (CV) methods are widely used to estimate out-of-sample prediction error. In big data problems, analytical formulas are attractive alternatives since CV methods are computationally expensive. If the parameter estimation time is T seconds for a data set with N records, the leave-one-out CV estimation time is TN seconds. Linhart and Volkers (1984: also see Linhart and Zucchini, 1986) showed a particular large sample analytic out-of-sample prediction error estimator was an unbiased estimator of CV estimation error for a large class of smooth empirical risk functions resulting in an estimation time of T rather than TN seconds. This theoretical result is an extension of the Takeuchi Information (Takeuchi, 1976) and Akaike Information (Akaike, 1973) Criteria. We provide easily verifiable assumptions for this theoretical result to hold. In addition, we report empirical results for logistic regression modeling that show the mean relative deviation between a nonparametric bootstrap CV estimator and the analytic out-of-sample prediction error estimator was less than 0.3% for three different data sets with respective sample sizes of n=583, n = 1728, and n = 4898 records.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program