Abstract:
|
One goal of model selection is the minimization of predictive risk. Typical derivations for model selection criteria assume that the distributions of covariates in the training set and future-prediction set are identical. In practice, we are often most interested in forecasts over a novel covariate distribution. This difference in intended use of the model impacts predictive risk, estimates of predictive risk, and summaries arising from it.
We formulate the predictive problem with divergence of the future-prediction set from the training set in terms of a diffusion of the distribution of covariates. As the covariates in the training set are diffused, measures of model complexity based on predictive risk change, impacting both model selection and choice of tuning parameters. We provide results in several settings, including subset selection, ridge regression, and kernel regression, and propose ways to adjust the standard measures of model complexity. Simulation studies show the benefits of the adjustment.
|