Clearing the Path for Truly Semantic Representation Learning

1 Jan 2021 · Dominik Zietlow, Michal Rolinek, Georg Martius ·

The performance of $\beta$-Variational-Autoencoders ($\beta$-VAEs) and their variants on learning semantically meaningful, disentangled representations is unparalleled. On the other hand, there are theoretical arguments suggesting impossibility of unsupervised disentanglement. In this work, we show that small perturbations of existing datasets hide the convenient correlation structure that is easily exploited by VAE-based architectures. To demonstrate this, we construct modified versions of the standard datasets on which (i) the generative factors are perfectly preserved; (ii) each image undergoes a transformation barely visible to the human eye; (iii) the leading disentanglement architectures fail to produce disentangled representations. We intend for these datasets to play a role in separating correlation-based models from those that discover the true causal structure. The construction of the modifications is non-trivial and relies on recent progress on mechanistic understanding of $\beta$-VAEs and their connection to PCA, while also providing additional insights that might be of stand-alone interest.

PDF Abstract