Maximum likelihood estimation of mixture models poses non-convex problems. Standard algorithms such as EM generally converge to a spurious local minimizer, both in theory and practice. In this talk, we present two results on the structures of these local minima.
For a mixture of two symmetric components, we show that all local minima are globally optimal, and EM converges to a global optimum from random initialization. This result applies to a broad class of log-concave mixture problems including mixtures of Gaussians and linear regressions. Our analysis highlights the non-convergence of EM in Euclidean distance, necessitating the consideration of other metrics.
For a mixture of more than two components, spurious local minima provably exist. One such local minimizer puts multiple centers at one true component, and another center in the middle of multiple true components. We prove that this is essentially the only type of spurious local minima under a separation condition. Consequently, EM is guaranteed to converge to a solution that at least partially identify the true mixture model. We will discuss further algorithmic implications of these results for over-parameterization.
|