Abstract:
|
We propose a novel diagnostic method to detect the phase change, from transient to stationary, in Stochastic Gradient Descent (SGD) with constant learning rate. It combines ideas from the pflug diagnostic introduced by Chee and Toulis (2018) and the splitting proposed in Su and Zhu (2018). We use this diagnostic method in a SGD loop where, every time stationarity is detected, the learning rate is decreased by a factor gamma. We prove some theoretical guarantee for the asymptotic validity of this procedure and show through simulations that it improves on several existing optimization techniques. In particular, it allows for a less precise tuning for the learning rate, which is usually of critical importance in iterative methods.
|