Abstract:
|
For solving large-scale non-convex problems, we propose inexact variants of trust region and adaptive cubic regularization methods, which, to increase efficiency, incorporate various approximations. In particular, in addition to approximate subproblem solves, both the Hessian and the gradient are suitably approximated. Using rather mild conditions on such approximations, we show that our proposed inexact methods achieve similar optimal worst-case iteration complexities as the exact counterparts. Our proposed algorithms, and their respective theoretical analysis, do not require knowledge of any unknowable problem-related quantities, and hence are easily implementable in practice. Furthermore, using the approximation of Hessian information, we propose a novel large batch training method on which combines recent results in adversarial training (to regularize against “sharp minima”) and second-order optimization (to use curvature information to change batch size adaptively during training). Without any additional hyper-parameter tuning, our method can reduce the number of SGD iterations of ResNet18 on Cifar-10/ImageNet to 44.8x and 28.8x, respectively
|