Winoground

Introduced by Thrush et al. in Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Winoground is a dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning. Given two images and two captions, the goal is to match them correctly -- but crucially, both captions contain a completely identical set of words, only in a different order. The dataset was carefully hand-curated by expert annotators and is labeled with a rich set of fine-grained tags to assist in analyzing model performance.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Visual Reasoning	Winoground	GPT-4V

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Image Captioning
Visual Reasoning

Similar Datasets

ARO

ImageCoDe

VALSE

WinoGAViL

Usage

License

Unknown

Modalities

Images
Texts

Languages

English

Winoground

Benchmarks Edit Add a new result Link an existing benchmark