$\alpha$-VAEs : Optimising variational inference by learning data-dependent divergence skew

The {\em skew-geometric Jensen-Shannon divergence} $\left(\textrm{JS}^{\textrm{G}_{\alpha}}\right)$ allows for an intuitive interpolation between forward and reverse Kullback-Leibler (KL) divergence based on the skew parameter $\alpha$. While the benefits of the skew in $\textrm{JS}^{\textrm{G}_{\alpha}}$ are clear---balancing forward/reverse KL in a comprehensible manner---the choice of optimal skew remains opaque and requires an expensive grid search. In this paper we introduce $\alpha$-VAEs, which extend the $\textrm{JS}^{\textrm{G}_{\alpha}}$ variational autoencoder by allowing for learnable, and therefore data-dependent, skew. We motivate the use of a parameterised skew in the dual divergence by analysing trends dependent on data complexity in synthetic examples. We also prove and discuss the dependency of the divergence minimum on the input data and encoder parameters, before empirically demonstrating that this dependency does not reduce to either direction of KL divergence for benchmark datasets. Finally, we demonstrate that optimised skew values consistently converge across a range of initial values and provide improved denoising and reconstruction properties. These render $\alpha$-VAEs an efficient and practical modelling choice across a range of tasks, datasets, and domains.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods