Text to Image Generation

458 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Text to Image Generation models and implementations

Datasets


Subtasks


Most implemented papers

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

taoxugit/AttnGAN CVPR 2018

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.

Zero-Shot Text-to-Image Generation

openai/DALL-E 24 Feb 2021

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.

Adding Conditional Control to Text-to-Image Diffusion Models

lllyasviel/controlnet ICCV 2023

ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

InstructPix2Pix: Learning to Follow Image Editing Instructions

timothybrooks/instruct-pix2pix CVPR 2023

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.

Composer: Creative and Controllable Image Synthesis with Composable Conditions

damo-vilab/composer 20 Feb 2023

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability.

Muse: Text-To-Image Generation via Masked Generative Transformers

lucidrains/muse-pytorch 2 Jan 2023

Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

luosiallen/latent-consistency-model 6 Oct 2023

Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).

CogView: Mastering Text-to-Image Generation via Transformers

THUDM/CogView NeurIPS 2021

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

dome272/paella 14 Nov 2022

Recent advancements in the domain of text-to-image synthesis have culminated in a multitude of enhancements pertaining to quality, fidelity, and diversity.