Abstract:

Understanding how overparameterized models can generalize well is a curious topic in modern learning theory. The widely used uniform convergence framework seeks to answer this question by bounding the test error of the worstcase model. In this talk, we revisit the statistical mechanics approach to learning, which instead attempts to understand the behavior of the typical model. To quantify this typicality in the setting of overparameterized linear classification, we develop a method to accurately compute the fraction of interpolating classifiers which attain a given (small) test error value. Empirically, we find that in many regimes this fraction is nearly one, indicating that most linear classifiers have a low test error. We also observe an interesting phase transition: for a given training and testing set, there is a critical test error below which this fraction is identically zero and above which it quickly approaches one. Hence, this critical value characterizes the test performance of the typical classifier. We then study this phenomenon theoretically and derive simple expressions for the critical error value, which qualitatively replicates the empirical behavior.
