Online Program Home
My Program

Abstract Details

Activity Number: 5 - New Developments in Modern Statistical Theory
Type: Invited
Date/Time: Sunday, July 28, 2019 : 2:00 PM to 3:50 PM
Sponsor: IMS
Abstract #300578
Title: Towards Demystifying Over-Parameterization in Deep Learning
Author(s): Mahdi Soltanolkotabi*
Companies: University of Southern California
Keywords: nonconvex optimization; neural networks; generalization

Many modern neural networks are trained in an over-parameterized regime where the parameters of the model exceed the size of the training dataset. Training these models involve highly non-convex landscapes and it is not clear how methods such as (stochastic) gradient descent provably find globally optimal models. Furthermore, due to their over-parameterized nature these neural networks in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, neural networks models trained via first-order methods continue to predict well on yet unseen test data. In this talk I will discuss some results aimed at demystifying such phenomena by demonstrating that gradient methods enjoy a few intriguing properties: (1) when initialized at random the iterates converge at a geometric rate to a global optima, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial estimate, (3) are provably robust to noise/corruption/shuffling on a fraction of the labels with these algorithms only fitting to the correct labels and ignoring the corrupted labels.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program