Abstract:
|
Variational inference proceeds by minimizing the KL divergence between an approximating distribution and a not-necessarily-normalized target distribution. Variational inference can often find good parameter estimates more quickly than MCMC, and does not suffer from the degeneracies that plague MAP inference, but in some cases it falls prey to poor local optima that do not arise in MAP or MCMC inference. An entropy term in the variational objective creates these local optima. In this work, I propose applying an upper bound to this entropy term that eliminates these local optima while still encouraging groups of hidden variables to match the expected sufficient statistics of their prior. I will demonstrate that this modification allows variational autoencoders to learn much higher-dimensional latent spaces than was previously possible.
|