Abstract:
|
Predictive models have long been known to benefit from regularization, where a regularization term is chosen to penalize model complexity and prevent overfitting (e.g., ridge regression, LASSO, etc.). In a Bayesian setting, such penalty terms correspond to the statistician’s choice of prior and/or hyperparameters. However, fitting Bayesian models often requires nontrivial computational resources, making both cross-validation and exhaustive grid search approaches for choosing hyperparameters undesirable. In this project, we explore how sequential importance sampling and stochastic optimization methodologies can be combined to efficiently identify hyperparameters that minimize out-of-sample prediction error for large Bayesian models. The success of this approach is demonstrated with a spatially varying coefficient (SVC) model for lake nutrient data, fitted to 10,000 lake observations from the LAGOS Northeast data set. We show that hyperparameter choice can have a dramatic impact on the predictive accuracy of SVC models, and that our methodology can be utilized to reduce computational time while simultaneously improving out-of-sample prediction.
|