Abstract:
|
While much work has been done in the area of model selection for high-dimensional regression, less attention has been given to model averaging in high dimensions. Because the high-dimensional setting increases the difficulty of model identification, a well-constructed model average often can provide large gains in prediction accuracy over model selection when the number of covariates is large. However, the challenges in high-dimensional regression regarding which, and how many, models to combine make it an under-studied topic. Another important question regarding model combination is whether it really improves prediction when a single good model does exist. To address these challenges, we introduce a procedure called Selected Model Averaging (SMA) that uses resampling to adaptively determine which and how many models to combine. Unlike many other model averaging methods, our method reduces to model selection when appropriate and thus bridges the gap between model selection and combination. Numerical studies demonstrate that our method performs well in a broad variety of high-dimensional settings.
|