Text-to-Image Generation

276 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

Most implemented papers

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

tobran/DF-GAN CVPR 2022

To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).

Autoregressive Image Generation using Residual Quantization

kakaobrain/rq-vae-transformer CVPR 2022

However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off.

All are Worth Words: A ViT Backbone for Diffusion Models

baofff/U-ViT CVPR 2023

We evaluate U-ViT in unconditional and class-conditional image generation, as well as text-to-image generation tasks, where U-ViT is comparable if not superior to a CNN-based U-Net of a similar size.

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

shi-labs/versatile-diffusion ICCV 2023

In this work, we expand the existing single-flow diffusion pipeline into a multi-task multimodal network, dubbed Versatile Diffusion (VD), that handles multiple flows of text-to-image, image-to-text, and variations in one unified model.

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

thu-ml/unidiffuser 12 Mar 2023

Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model -- perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality.

GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation

OPPO-Mente-Lab/GlyphDraw 31 Mar 2023

Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for complex glyph structures like Chinese characters.

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

tencentarc/masactrl ICCV 2023

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results.

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

zyxElsa/ProSpect 25 May 2023

We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models.

BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion

segmind/distill-sd 25 May 2023

Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high computing demands due to billion-scale parameters.

StyleDrop: Text-to-Image Generation in Any Style

zideliu/StyleDrop-PyTorch 1 Jun 2023

Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts.