TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-to-Image Generation	LAION COCO	Parti Finetuned	FID	8.39	# 1
Text-to-Image Generation	LAION COCO	Parti	FID	15.97	# 2
Text-to-Image Generation	MS COCO	Parti Finetuned	FID	3.22	# 1
Text-to-Image Generation	MS COCO	Parti	FID	7.23	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scaling-autoregressive-models-for-content/text-to-image-generation-on-laion-coco)](https://paperswithcode.com/sota/text-to-image-generation-on-laion-coco?p=scaling-autoregressive-models-for-content)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scaling-autoregressive-models-for-content/text-to-image-generation-on-coco)](https://paperswithcode.com/sota/text-to-image-generation-on-coco?p=scaling-autoregressive-models-for-content)`

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 Jun 2022 · Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu ·

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens as the target outputs rather than text tokens in another language. This strategy can naturally tap into the rich body of prior work on large language models, which have seen continued advances in capabilities and performance through scaling data and model sizes. Our approach is simple: First, Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens. Second, we achieve consistent quality improvements by scaling the encoder-decoder Transformer model up to 20B parameters, with a new state-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO. Our detailed analysis on Localized Narratives as well as PartiPrompts (P2), a new holistic benchmark of over 1600 English prompts, demonstrate the effectiveness of Parti across a wide variety of categories and difficulty aspects. We also explore and highlight limitations of our models in order to define and exemplify key areas of focus for further improvements. See https://parti.research.google/ for high-resolution images.

PDF Abstract

Code

Add Remove Mark official

lucidrains/parti-pytorch

505

syang-lab/Pathway_Autoregressive_Te…

Tasks

Add Remove

Image Generation

Machine Translation

Text-to-Image Generation

World Knowledge

Datasets

Introduced in the Paper:

Used in the Paper:

MS COCO test

LAION-400M

Localized Narratives

LAION COCO

Results from the Paper

Edit

Ranked #1 on Text-to-Image Generation on LAION COCO

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-to-Image Generation	LAION COCO	Parti Finetuned	FID	8.39	# 1	Compare
Text-to-Image Generation	LAION COCO	Parti	FID	15.97	# 2	Compare
Text-to-Image Generation	MS COCO	Parti Finetuned	FID	3.22	# 1	Compare
Text-to-Image Generation	MS COCO	Parti	FID	7.23	# 16	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove