Text-to-Image Generation

276 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Image Generation

Dataset	Best Model	Compare
MS COCO	Parti Finetuned	See all
CUB	TLDM	See all
Multi-Modal-CelebA-HQ	Swinv2-Imagen	See all
Oxford 102 Flowers	VQ-Diffusion-F	See all
Conceptual Captions	Contextual RQ-Transformer	See all
LHQC	NUWA-Infinity	See all
MS-COCO	AttnGAN	See all
GeNeVA (CoDraw)	LatteGAN	See all
GeNeVA (i-CLEVR)	LatteGAN	See all
LAION COCO	Parti Finetuned	See all
Colors	BiLSTMS on color generation	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

faceonlive/ai-research

4 papers

179

hanzhanggit/StackGAN

3 papers

1,849

kakaobrain/rq-vae-transformer

3 papers

690

hanzhanggit/StackGAN-Pytorch

3 papers

480

See all 18 libraries.

Datasets

Subtasks

Concept Alignment

Conditional Text-to-Image Synthesis

Consistent Character Generation

DreamBooth Personalized Generation

Most implemented papers

Most implemented Social Latest No code

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

tobran/DF-GAN • • CVPR 2022

To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).

Paper
Code

Autoregressive Image Generation using Residual Quantization

kakaobrain/rq-vae-transformer • • CVPR 2022

However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off.

Paper
Code

All are Worth Words: A ViT Backbone for Diffusion Models

baofff/U-ViT • • CVPR 2023

We evaluate U-ViT in unconditional and class-conditional image generation, as well as text-to-image generation tasks, where U-ViT is comparable if not superior to a CNN-based U-Net of a similar size.

Paper
Code

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

shi-labs/versatile-diffusion • • ICCV 2023

In this work, we expand the existing single-flow diffusion pipeline into a multi-task multimodal network, dubbed Versatile Diffusion (VD), that handles multiple flows of text-to-image, image-to-text, and variations in one unified model.

Paper
Code

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

thu-ml/unidiffuser • • 12 Mar 2023

Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model -- perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality.

Paper
Code

GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation

OPPO-Mente-Lab/GlyphDraw • • 31 Mar 2023

Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for complex glyph structures like Chinese characters.

Paper
Code

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

tencentarc/masactrl • • ICCV 2023

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results.

Paper
Code

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

zyxElsa/ProSpect • • 25 May 2023

We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models.

Paper
Code

BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion

segmind/distill-sd • • 25 May 2023

Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high computing demands due to billion-scale parameters.

Paper
Code

StyleDrop: Text-to-Image Generation in Any Style

zideliu/StyleDrop-PyTorch • • 1 Jun 2023

Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts.

Paper
Code

Text-to-Image Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result