Abstract:
|
Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level. This can seem puzzling; in the worst case, such models do not need to generalize. Here we develop a deeper understanding of this area. Specifically, we propose using \emph{the analysis of variance} (ANOVA) to decompose the variance in the test error in a symmetric way, for studying the generalization performance of certain two-layer linear and non-linear networks. One key insight is that in typical settings, the \emph{interaction} between training samples and initialization can dominate the variance; surprisingly being larger than their marginal effect.
|