Zero-Shot Text-to-Image Generation
8 papers with code • 0 benchmarks • 0 datasets
These leaderboards are used to track progress in Zero-Shot Text-to-Image Generation
Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.
We approach text-to-image generation by combining the power of the retrained CLIP representation with an off-the-shelf image generator (GANs), optimizing in the latent space of GAN to find images that achieve maximum CLIP score with the given input text.
Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity.
Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style.