TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-to-Image Generation	MS COCO	Corgi-Semi	FID	10.6	# 28
Text-to-Image Generation	MS COCO	Corgi	FID	10.88	# 29
Text-to-Image Generation	Multi-Modal-CelebA-HQ	Corgi	FID	19.74	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shifted-diffusion-for-text-to-image/text-to-image-generation-on-multi-modal)](https://paperswithcode.com/sota/text-to-image-generation-on-multi-modal?p=shifted-diffusion-for-text-to-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shifted-diffusion-for-text-to-image/text-to-image-generation-on-coco)](https://paperswithcode.com/sota/text-to-image-generation-on-coco?p=shifted-diffusion-for-text-to-image)`

Shifted Diffusion for Text-to-image Generation

CVPR 2023 · Yufan Zhou, Bingchen Liu, Yizhe Zhu, Xiao Yang, Changyou Chen, Jinhui Xu ·

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

drboog/Shifted_Diffusion official

154

Tasks

Add Remove

Image Generation

Text-to-Image Generation

Zero-Shot Text-to-Image Generation

Datasets

MS COCO

CUB-200-2011 DrawBench

Multi-Modal CelebA-HQ

Results from the Paper

Edit

Ranked #3 on Text-to-Image Generation on Multi-Modal-CelebA-HQ

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-to-Image Generation	MS COCO	Corgi-Semi	FID	10.6	# 28	Compare
Text-to-Image Generation	MS COCO	Corgi	FID	10.88	# 29	Compare
Text-to-Image Generation	Multi-Modal-CelebA-HQ	Corgi	FID	19.74	# 3	Compare

Methods

Add Remove

CLIP • Diffusion • None

Edit Social Preview

Shifted Diffusion for Text-to-image Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove