Abstract:
|
Stochastic gradient algorithms are widely used for large-scale inference problems. However, their use in practice is typically guided by heuristics and trial-and-error rather than rigorous, generalizable theory. We take a step toward better understanding the effect of the tuning parameters of these algorithms by characterizing the large-sample behavior of iterates of a very general class of preconditioned stochastic gradient algorithms with fixed step size, including stochastic gradient descent with and without additional Gaussian noise, momentum, and/or acceleration. We show that near a local optimum, the iterates converge weakly to paths of an Ornstein–Uhlenbeck process. In particular, with appropriate choices of tuning parameters, the limiting stationary covariance can match the Bernstein–von Mises-limit of the posterior or adjustments robust to model misspecification. Moreover, an essentially independent sample from the stationary distribution can be obtained after a fixed number of passes over the dataset. Our results show that properly tuned stochastic gradient algorithms offer a practical approach to inference that is computationally efficient and statistically robust.
|