Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 319 - SLDS CSpeed 6
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 3:30 PM to 5:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #317995
Title: A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks
Author(s): Zhiqi Bu* and Shiyun Xu and Kan Chen
Companies: University of Pennsylvania and University of Pennsylvania and University of Pennsylvania
Keywords: Neural network; Overparameterization; Dynamical system; Neural tangent kernel; Optimization algorithm; Convergence

Over-parameterized neural networks have demonstrated fast convergence and strong performance even though the loss function is non-convex and non-smooth. While many works have been focusing on understanding the loss dynamics by training neural networks with the gradient descent (GD), in this work, we consider a broad class of optimization algorithms that are commonly used in practice. For example, we show from a dynamical system perspective that the Heavy Ball (HB) method can converge to global minimum on mean squared error (MSE) at a linear rate (similar to GD); however, the Nesterov accelerated gradient descent (NAG) may only converges to global minimum sublinearly.

Our results rely on the connection between neural tangent kernel (NTK) and finite over-parameterized neural networks with ReLU activation, which leads to analyzing the limiting ordinary differential equations (ODE) for optimization algorithms. We show that, optimizing the non-convex loss over the weights corresponds to optimizing some strongly convex loss over the prediction error. As a consequence, we can leverage the classical convex optimization theory to understand the convergence behavior of neural networks.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program