InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

PDF Abstract NeurIPS 2016 PDF NeurIPS 2016 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Unsupervised MNIST MNIST InfoGAN Accuracy 95 # 8
Unsupervised Image Classification MNIST InfoGAN Accuracy 95 # 7

Results from Other Papers

Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Image Generation CUB 128 x 128 InfoGAN FID 13.20 # 3
Inception score 47.32 # 2
Image Generation Stanford Cars InfoGAN FID 17.63 # 3
Inception score 28.62 # 2
Image Generation Stanford Dogs InfoGAN FID 29.34 # 3
Inception score 43.16 # 2