Diffusion Models Beat GANs on Image Synthesis

NeurIPS 2021  ·  Prafulla Dhariwal, Alex Nichol ·

We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet 256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256$\times$256 and 3.85 on ImageNet 512$\times$512. We release our code at https://github.com/openai/guided-diffusion

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Conditional Image Generation ImageNet 128x128 ADM-G (classifier_scale=0.5) FID 2.97 # 5
Image Generation ImageNet 128x128 ADM-G FID 2.97 # 10
Conditional Image Generation ImageNet 256x256 ADM-G FID 4.59 # 3
Inception score 186.7 # 3
Image Generation ImageNet 256x256 ADM-G FID 4.59 # 57
Image Generation ImageNet 256x256 ADM-G, ADM-U FID 3.94 # 49
Image Generation ImageNet 512x512 ADM-G, ADM-U FID 3.85 # 29
Inception score 221.72 # 7
Image Generation ImageNet 512x512 ADM-G FID 7.72 # 34
Inception score 172.71 # 10
Image Generation ImageNet 64x64 ADM (dropout) FID 2.07 # 14
Image Generation LSUN Bedroom 256 x 256 ADM (dropout, DINOv2) FD 59.64 # 1
Precision 0.85 # 1
Recall 0.75 # 1
Image Generation LSUN Bedroom 256 x 256 ADM (dropout) FID 1.90 # 4
Image Generation LSUN Cat 256 x 256 ADM (dropout) FID 5.57 # 3
Image Generation LSUN Horse 256 x 256 ADM (dropout) FID 2.57 # 3

Methods