Keywords: Stochastic Gradient, Online Learning, Nonstationary Optimization, Error Bound
The stochastic gradient (SG) algorithm is widely applied in stochastic optimization. The important properties of convergence and asymptotic normality are based on the assumption of a stationary and unique optimizer. However, it is often the case that the target parameter and/or the underlying loss function drift over time and the notion of "convergence" no longer applies. To keep up in a nonstationary environment, the gain sequence of SG has to be non-decaying. We delivered a computable error bound of constant-gain SG algorithm for practical use. The case of interest requires the strong convexity of the time-varying loss function, but the restriction imposed on the drift of the target parameter is minimal. Moreover, we related the recursive stochastic optimization algorithms with a limiting nonautonomous ordinary differential equation (ODE), and provided a probability bound using the perturbed ODE theory. An important special case of the nonautonomous ODE is separable equations—this allows us to bring cyclic stochastic optimization into the discussion, by accommodating the structure of switching sub-loss-function to be optimized.