Text-to-Image Generation

287 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Image Generation

Dataset	Best Model	Compare
MS COCO	Parti Finetuned	See all
CUB	TLDM	See all
Multi-Modal-CelebA-HQ	Swinv2-Imagen	See all
Oxford 102 Flowers	VQ-Diffusion-F	See all
Conceptual Captions	Contextual RQ-Transformer	See all
LHQC	NUWA-Infinity	See all
MS-COCO	AttnGAN	See all
GeNeVA (CoDraw)	LatteGAN	See all
GeNeVA (i-CLEVR)	LatteGAN	See all
LAION COCO	Parti Finetuned	See all
Colors	BiLSTMS on color generation	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

faceonlive/ai-research

4 papers

259

hanzhanggit/StackGAN

3 papers

1,851

kakaobrain/rq-vae-transformer

3 papers

711

hanzhanggit/StackGAN-Pytorch

3 papers

483

See all 18 libraries.

Datasets

Subtasks

Concept Alignment

Conditional Text-to-Image Synthesis

Consistent Character Generation

DreamBooth Personalized Generation

Most implemented papers

Most implemented Social Latest No code

MaskGIT: Masked Generative Image Transformer

google-research/maskgit • • CVPR 2022

At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation.

Paper
Code

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

weihaox/TediGAN • • CVPR 2021

In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions.

Paper
Code

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

MinfengZhu/DM-GAN • • CVPR 2019

If the initial image is not well initialized, the following processes can hardly refine the image to a satisfactory quality.

Paper
Code

CogView: Mastering Text-to-Image Generation via Transformers

THUDM/CogView • • NeurIPS 2021

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.

Paper
Code

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

ofa-sys/ofa • • 7 Feb 2022

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization.

Paper
Code

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

dome272/paella • • 14 Nov 2022

Recent advancements in the domain of text-to-image synthesis have culminated in a multitude of enhancements pertaining to quality, fidelity, and diversity.

Paper
Code

Muse: Text-To-Image Generation via Masked Generative Transformers

lucidrains/muse-pytorch • • 2 Jan 2023

Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.

Paper
Code

Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction

Maluuba/GeNeVA • • ICCV 2019

Conditional text-to-image generation is an active area of research, with many possible applications.

Paper
Code

ManiGAN: Text-Guided Image Manipulation

mrlibw/ManiGAN • • 12 Dec 2019

The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e. g., texture, colour, and background), while preserving other contents that are irrelevant to the text.

Paper
Code

Conditional Image Generation and Manipulation for User-Specified Content

IIGROUP/Multi-Modal-CelebA-HQ-Dataset • • 11 May 2020

This can be done by conditioning the model on additional information.

Paper
Code

Text-to-Image Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result