Abstract:
|
Neural nets have made an amazing comeback during the past decade. Their empirical success has been truly phenomenal, but neural nets are poorly understood in a mathematical sense compared to classical methods like splines, kernels, and wavelets. This talk describes recent steps towards a mathematical theory of neural networks comparable to the foundations we have for classical nonparametric methods. Surprisingly, neural nets are minimax optimal in a wide variety of classical univariate function spaces, including those handled by splines and wavelets. In multivariate settings, neural nets are solutions to data-fitting problems cast in entirely new types of multivariate function spaces characterized through total variation (TV) measured in the Radon transform domain. And deep (multilayer) neural nets naturally represent compositions of functions in these Radon-BV (bounded variation) spaces. Remarkably, this theory provides novel explanations for many notable empirical discoveries in deep learning, including the benefits of “skip connections” and sparse and low-rank weight matrices. Radon-BV spaces set the stage for the nonparametric theory of neural nets.
|