Your Diffusion Model is Secretly a Zero-Shot Classifier

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. Although a gap remains between generative and discriminative approaches on zero-shot recognition tasks, our diffusion-based approach has significantly stronger multimodal compositional reasoning ability than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Our models achieve strong classification performance using only weak augmentations and exhibit qualitatively better "effective robustness" to distribution shift. Overall, our results are a step toward using generative over discriminative models for downstream tasks. Results and visualizations at https://diffusion-classifier.github.io/

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Classification CIFAR-10 Diffusion Classifier (zero-shot) Percentage correct 88.5 # 207
Fine-Grained Image Classification FGVC Aircraft Diffusion Classifier (zero-shot) Accuracy 26.4 # 53
Image Classification Flowers-102 Diffusion Classifier (zero-shot) Per-Class Accuracy 66.3 # 1
Zero-Shot Transfer Image Classification Food-101 Diffusion Classifier (zero-shot) Top 1 Accuracy 77.7 # 5
Zero-Shot Transfer Image Classification ImageNet Diffusion Classifier (zero-shot) Accuracy (Private) 61.4 # 21
Image Classification ImageNet Diffusion Classifier Top 1 Accuracy 79.1% # 728
Domain Generalization ImageNet-A Diffusion Classifier Top-1 accuracy % 30.2 # 27
Image Classification ObjectNet (ImageNet classes) Diffusion Classifier (zero-shot) Top 1 Accuracy 43.4 # 1
Image Classification ObjectNet (ImageNet classes) Diffusion Classifier Top 1 Accuracy 33.9 # 2
Image Classification Oxford-IIIT Pets Diffusion Classifier (zero-shot) Per-Class Accuracy 87.3 # 1
Image Classification STL-10 Diffusion Classifier (zero-shot) Percentage correct 95.4 # 19
Visual Reasoning Winoground Diffusion Classifier (zero-shot) Text Score 34.00 # 55

Methods