83 papers with code • 9 benchmarks • 9 datasets
This task refers to image generation based on a given sentence or sequence of words.
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.
In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.
In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.
If the initial image is not well initialized, the following processes can hardly refine the image to a satisfactory quality.
Conditional text-to-image generation is an active area of research, with many possible applications.