Online Program Home
My Program

Abstract Details

Activity Number: 313 - Statistical Models in Survey Sampling and Analysis
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 8:30 AM to 10:20 AM
Sponsor: Survey Research Methods Section
Abstract #330498 Presentation
Title: Estimating Prediction Error for Complex Samples
Author(s): Andrew James Holbrook* and Daniel L. Gillen and Thomas Lumley
Companies: UC Irvine and University of California, Irvine and University of Auckland
Keywords: AIC; Generalized linear models; Generalization error; Horvitz-Thompson; Survey samples; NHANES
Abstract:

With a growing interest in using complex samples to train prediction models for numerous outcomes it is necessary to account for the sampling design that gave rise to the data in order to assess the generalized predictive utility of a proposed prediction rule. After learning a prediction rule based on a non-uniform sample, it is of interest to estimate the rule's error rate when applied to unobserved members of the population. Efron (1986) proposed a general class of covariance-inflated prediction error estimators that assume the available training data is representative of the target population for which the prediction rule is to be applied. We extend Efron's estimator to the complex sample context by incorporating Horvitz-Thompson sampling weights and show that it is consistent for the true generalization error rate when applied to the underlying superpopulation. The resulting Horvitz-Thompson-Efron (HTE) estimator is equivalent to dAIC, a recent extension of AIC to survey sampling data, but is more widely applicable. The proposed methodology is assessed with simulations and is applied to models predicting renal function obtained from the large-scale NHANES survey.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program