DrawBench Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

**DrawBench** is a comprehensive and challenging benchmark for text-to-image models, introduced by the **Imagen** research team. Let me provide you with more details:

1. **Purpose and Context**:
   - **DrawBench** serves as an evaluation benchmark specifically designed to assess the performance of text-to-image models.
   - It allows researchers and practitioners to compare different methods and understand their strengths and weaknesses in generating images from textual descriptions.

2. **Imagen: Text-to-Image Diffusion Models**:
   - **Imagen** is a state-of-the-art text-to-image diffusion model developed by the Google Research Brain Team.
   - It combines the power of large transformer language models (such as T5) for understanding text with the strength of diffusion models for high-fidelity image generation.
   - **Key Discovery**: Imagen demonstrates that generic large language models pretrained on text-only corpora are remarkably effective at encoding text for image synthesis.
   - **Photorealism and Language Understanding**: Imagen achieves an unprecedented degree of photorealism and a deep level of language understanding.
   - **FID Score**: It achieves a new state-of-the-art FID (Fréchet Inception Distance) score of **7.27** on the COCO dataset, without ever being trained on COCO.
   - **Human Raters' Perception**: Human raters find Imagen samples to be on par with the COCO data itself in terms of image-text alignment.

3. **DrawBench: A Comprehensive Benchmark**:
   - **DrawBench** provides a rigorous evaluation framework for text-to-image models.
   - Researchers can compare Imagen with other recent methods, including **VQ-GAN+CLIP**, **Latent Diffusion Models**, and **DALL-E 2**.
   - Human raters prefer Imagen over other models in side-by-side comparisons, considering both sample quality and image-text alignment.

4. **Examples from the Imagen Family**:
   - Imagen generates diverse and imaginative images based on textual prompts. Here are some examples:
     - A strawberry mug filled with white sesame seeds, floating in a dark chocolate sea.
     - A brain riding a rocketship heading towards the moon.
     - A dragon fruit wearing a karate belt in the snow.
     - A small cactus wearing a straw hat and neon sunglasses in the Sahara desert.
     - A photo of a Corgi dog riding a bike in Times Square, wearing sunglasses and a beach hat.
     - Teddy bears swimming at the Olympics 400m Butterfly event.
     - Sprouts in the shape of the text 'Imagen' coming out of a fairytale book.
     - A transparent sculpture of a duck made out of glass, in front of a painting of a landscape.
     - A single beam of light entering the room from the ceiling, illuminating an easel with a Rembrandt painting of a raccoon.

5. **Technical Details**:
   - Imagen uses a large frozen **T5-XXL** encoder to encode input text into embeddings.
   - The combination of language understanding and diffusion-based image generation results in high-quality, contextually relevant images.

Source: Conversation with Bing, 3/18/2024
(1) Imagen: Text-to-Image Diffusion Models. https://imagen.research.google/.
(2) Evaluating Diffusion Models - Hugging Face. https://huggingface.co/docs/diffusers/conceptual/evaluation.
(3) shunk031/DrawBench · Datasets at Hugging Face. https://huggingface.co/datasets/shunk031/DrawBench.
(4) sayakpaul/drawbench · Datasets at Hugging Face. https://huggingface.co/datasets/sayakpaul/drawbench.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

DrawBench

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

HEIM

Pick-a-Pic

Usage

License

Modalities

Languages

DrawBench

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit