Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 261 - High-Dimensional Statistical Inference Meets Large-Scale Optimization
Type: Topic Contributed
Date/Time: Tuesday, August 4, 2020 : 1:00 PM to 2:50 PM
Sponsor: IMS
Abstract #312773
Title: Understanding the Effect of Learning Rate in Deep Learning
Author(s): Weijie Su*
Companies: University of Pennsylvania
Keywords:
Abstract:

The learning rate is perhaps the single most important parameter in the training of neural networks and, more broadly, in stochastic (nonconvex) optimization. Accordingly, there are numerous effective, but poorly understood, techniques for tuning the learning rate, including learning rate decay. In this talk, we present a general theoretical analysis of the effect of the learning rate in stochastic gradient descent (SGD). Our analysis is based on the use of a learning-rate-dependent stochastic differential equation (lr-dependent SDE) that serves as a surrogate for SGD. For a broad class of objective functions, we establish a linear rate of convergence for this continuous-time formulation of SGD, highlighting the fundamental importance of the learning rate in SGD, and contrasting to gradient descent and stochastic gradient Langevin dynamics. Moreover, we obtain an explicit expression for the optimal linear rate by analyzing the spectrum of the Witten-Laplacian, a special case of the Schr¨odinger operator associated with the lr-dependent SDE.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program