TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-to-Image Generation	CUB	VQ-Diffusion-F	FID	10.32	# 5
Text-to-Image Generation	CUB	VQ-Diffusion-S	FID	12.97	# 8
Text-to-Image Generation	CUB	VQ-Diffusion-B	FID	11.94	# 7
Text-to-Image Generation	MS COCO	VQ-Diffusion-B	FID	19.75	# 44
Text-to-Image Generation	MS COCO	VQ-Diffusion-F	FID	13.86	# 38
Text-to-Image Generation	Oxford 102 Flowers	VQ-Diffusion-F	FID	14.1	# 1
Text-to-Image Generation	Oxford 102 Flowers	VQ-Diffusion-S	FID	14.95	# 3
Text-to-Image Generation	Oxford 102 Flowers	VQ-Diffusion-B	FID	14.88	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vector-quantized-diffusion-model-for-text-to/text-to-image-generation-on-oxford-102)](https://paperswithcode.com/sota/text-to-image-generation-on-oxford-102?p=vector-quantized-diffusion-model-for-text-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vector-quantized-diffusion-model-for-text-to/text-to-image-generation-on-cub)](https://paperswithcode.com/sota/text-to-image-generation-on-cub?p=vector-quantized-diffusion-model-for-text-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vector-quantized-diffusion-model-for-text-to/text-to-image-generation-on-coco)](https://paperswithcode.com/sota/text-to-image-generation-on-coco?p=vector-quantized-diffusion-model-for-text-to)`

Vector Quantized Diffusion Model for Text-to-Image Synthesis

CVPR 2022 · Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo ·

We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the text-to-image generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQ-Diffusion allows us to achieve a better trade-off between quality and speed. Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

cientgu/vq-diffusion official

403

microsoft/vq-diffusion

836

Tasks

Add Remove

Denoising

Image Generation

Text-to-Image Generation

Datasets

ImageNet

MS COCO

CUB-200-2011

FFHQ

Oxford 102 Flower

Conceptual Captions

LAION-400M

Results from the Paper

Edit

Ranked #1 on Text-to-Image Generation on Oxford 102 Flowers (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-to-Image Generation	CUB	VQ-Diffusion-F	FID	10.32	# 5	Compare
Text-to-Image Generation	CUB	VQ-Diffusion-S	FID	12.97	# 8	Compare
Text-to-Image Generation	CUB	VQ-Diffusion-B	FID	11.94	# 7	Compare
Text-to-Image Generation	MS COCO	VQ-Diffusion-B	FID	19.75	# 44	Compare
Text-to-Image Generation	MS COCO	VQ-Diffusion-F	FID	13.86	# 38	Compare
Text-to-Image Generation	Oxford 102 Flowers	VQ-Diffusion-F	FID	14.1	# 1	Compare
Text-to-Image Generation	Oxford 102 Flowers	VQ-Diffusion-S	FID	14.95	# 3	Compare
Text-to-Image Generation	Oxford 102 Flowers	VQ-Diffusion-B	FID	14.88	# 2	Compare

Methods

Add Remove

AutoEncoder • Diffusion

Edit Social Preview

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove