Despite the advances in the representational capacity of approximate
distributions for variational inference, the optimization process can still
limit the density that is ultimately learned. We demonstrate the drawbacks of
biasing the true posterior to be unimodal, and introduce Annealed Variational
Objectives (AVO) into the training of hierarchical variational methods...
Inspired by Annealed Importance Sampling, the proposed method facilitates
learning by incorporating energy tempering into the optimization objective. In
our experiments, we demonstrate our method's robustness to deterministic warm
up, and the benefits of encouraging exploration in the latent space.