Online Program

Return to main conference page
Friday, October 20
Fri, Oct 20, 10:00 AM - 11:30 AM
Aventine Ballroom A
Celebrating our Technical Contributions

Generalized Cp (GCp) and Bootstrap for Model Selection (304008)

*Linda Zhou, University of Pennsylvania 

Mallows' Cp is a frequently used tool for variable selection in linear models. It can be derived and interpreted as an estimate of (normalized) predictive squared error in a very special situation. Two key features of that situation are firstly the observed covariate variables and the covariates for the predictive population are fixed design, and secondly the observations in the sample and in the predictive universe follow a homoscedastic linear model. The former assumption does not accord with most of the common statistical settings, and the latter is very frequently undesirably optimistic in practical settings. We derive an easily computed variant of Mallows' expression that relies on neither of these assumptions. The new variant, denoted as GCp, provides an asymptotically unbiased estimate for the predictive squared error when the best linear estimator with the currently selected variables is used for future observations drawn from the same population. The formulation is model free in that there are virtually no assumptions on the true sampling distribution and is applicable to random design. A simple Bootstrap method is also proposed to achieve the same goal. This is a joint work with L. Brown, J. Cai and the Wharton team.