Abstract:
|
We introduce a one-step model selection technique for general regression estimators to provide a solution to the problem of statistical model selection. Under very general assumptions, this technique correctly identifies the set of non-zero values in the true coefficient (of length p) by comparing only p + 1 models. We start by defining our selection criterion for a class of candidate models larger than considered before, and providing population-level results that differentiate between correct and wrong models within this class. After this we provide results for a general bootstrap scheme to estimate the criterion in a sample setup, and discuss its details for linear and linear mixed models. Simulations and a real data example demonstrate the efficacy of our method over existing model selection strategies in terms of detecting the correct set of predictors as well as accurate out-of-sample predictions. At the end we also discuss some immediate applications and possible extensions of this foundational methodology.
|