FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way to induce the desired hierarchy. Through extensive experiments, we show that FineGAN achieves the desired disentanglement to generate realistic and diverse images belonging to fine-grained classes of birds, dogs, and cars. Using FineGAN's automatically learned features, we also cluster real images as a first attempt at solving the novel problem of unsupervised fine-grained object category discovery. Our code/models/demo can be found at

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Generation CUB 128 x 128 FineGAN FID 11.25 # 2
Inception score 52.53 # 1
Image Clustering CUB Birds FineGAN Accuracy 0.126 # 1
NMI 0.403 # 1
Image Clustering Stanford Cars FineGAN Accuracy 0.078 # 1
NMI 0.354 # 1
Image Generation Stanford Cars FineGAN FID 16.03 # 2
Inception score 32.62 # 1
Image Clustering Stanford Dogs FineGAN Accuracy 0.079 # 1
NMI 0.233 # 1
Image Generation Stanford Dogs FineGAN FID 25.66 # 2
Inception score 46.92 # 1