Text-to-Image Generation

Zero-Shot Text-to-Image Generation

openai/DALL-E 24 Feb 2021

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.

Zero-Shot Text-to-Image Generation

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

hanzhanggit/StackGAN 19 Oct 2017

In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

hanzhanggit/StackGAN ICCV 2017

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.

Generative Adversarial Text to Image Synthesis

hanzhanggit/StackGAN 17 May 2016

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal.

CogView: Mastering Text-to-Image Generation via Transformers

lucidrains/x-transformers NeurIPS 2021

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

taoxugit/AttnGAN CVPR 2018

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.

Ranked #3 on Text-to-Image Generation on COCO (SOA-C metric)

Generating Images from Captions with Attention

mansimov/text2image 9 Nov 2015

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions.

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

microsoft/nuwa 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Towards Open-World Text-Guided Face Image Generation and Manipulation

IIGROUP/TediGAN 18 Apr 2021

To be specific, we propose a brand new paradigm of text-guided image generation and manipulation based on the superior characteristics of a pretrained GAN model.

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

weihaox/TediGAN CVPR 2021

In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions.

