Keywords: Bayesian inference, data-dependent prior, model averaging, predictive distribution, uncertainty quantification
Often the primary goal of fitting a regression model is prediction, but the majority of work in recent years focuses on inference tasks, such as estimation and feature selection. In this paper we adopt the familiar sparse, high-dimensional linear regression model but focus on the task of prediction. In particular, we consider a new empirical Bayes framework that uses the data to appropriately center the prior distribution for the non-zero regression coefficients, and we investigate the method’s theoretical and numerical performance in the context of prediction. We show that, in certain settings, the asymptotic posterior concentration in metrics relevant to prediction quality is very fast, and we establish a Bernstein–von Mises theorem which ensures that the derived prediction intervals achieve the target coverage probability. Numerical results complement the asymptotic theory, showing that, in addition to having strong finite-sample performance in terms of prediction accuracy and uncertainty quantification, the computation time is considerably faster compared to existing Bayesian methods.