Abstract:
|
Mallows' Cp is a frequently used tool for variable selection in linear models. It can be derived and interpreted as an estimate of (normalized) predictive squared error in a very special situation. Two key features of that situation are: 1) The observed covariate variables and the covariates for the predictive population are fixed design, 2) the observations in the sample and in the predictive universe follow a homoscedastic linear model. Assumption 1) does not accord with most of the common statistical settings in which Cp is employed, and assumption 2) is very frequently undesirably optimistic in practical settings. We derive an easily computed variant of Mallows' expression that does not rely on either of these assumptions. The new variant, denoted as GCP, provides an asymptotically unbiased estimate for the predictive squared error when the best linear estimator with the currently selected variables is used for future observations drawn from the same population. The formulation is "assumption-lean" in that there are virtually no assumptions on the true sampling distribution. Joint work with L. Brown, J. Cai and the Wharton group
|