Text-to-image generation is the task of generating images from text descriptions or captions.
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images. The Stage-I GAN sketches the primitive shape and colors of the object based on given text description, yielding low-resolution images.
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts.
Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations.
In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different subregions of the image by paying attentions to the relevant words in the natural language description.
SOTA for Text-to-Image Generation on CUB
Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description.
SOTA for Text-to-Image Generation on MS-COCO
The object pathway focuses solely on the individual objects and is iteratively applied at the locations specified by the bounding boxes. Our experiments show that through the use of the object pathway we can control object locations within images and can model complex scenes with multiple objects at various locations.
#2 best model for Text-to-Image Generation on COCO
To tackle the problem, we propose a multi-conditional GAN (MC-GAN) which controls both the object and background information jointly. This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image using the foreground information from the text attributes.