Abstract:
|
The stochastic gradient method (SGM, also known as SGD) has become the algorithm of choice in machine learning for handling large datasets. We consider the stationary distribution of SGM with small but finite step sizes applied to maximum-a-posteriori (MAP) inference, and find that its stationary distribution can be easily tuned to minimize its KL divergence to the posterior, yielding an alternate interpretation of the stochastic gradient Fisher scoring algorithm of Ahn et al. (2012). With a bit more work and a few more assumptions, it is also possible to minimize the KL divergence in the other direction, yielding usable marginal uncertainty estimates.
|