TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Reasoning	Winoground	Random chance	Text Score	25.00	# 85
Visual Reasoning	Winoground	Random chance	Image Score	25.00	# 34
Visual Reasoning	Winoground	Random chance	Group Score	16.67	# 40
Visual Reasoning	Winoground	VSRN (Flickr30k)	Text Score	20.00	# 97
Visual Reasoning	Winoground	VSRN (Flickr30k)	Image Score	5.00	# 108
Visual Reasoning	Winoground	VSRN (Flickr30k)	Group Score	3.50	# 98
Visual Reasoning	Winoground	VisualBERT base	Text Score	15.50	# 109
Visual Reasoning	Winoground	VisualBERT base	Image Score	2.50	# 110
Visual Reasoning	Winoground	VisualBERT base	Group Score	1.50	# 104
Visual Reasoning	Winoground	VSRN (COCO)	Text Score	17.50	# 105
Visual Reasoning	Winoground	VSRN (COCO)	Image Score	7.00	# 102
Visual Reasoning	Winoground	VSRN (COCO)	Group Score	3.75	# 97
Visual Reasoning	Winoground	VSE++ (COCO, VGG)	Text Score	18.75	# 103
Visual Reasoning	Winoground	VSE++ (COCO, VGG)	Image Score	5.50	# 106
Visual Reasoning	Winoground	VSE++ (COCO, VGG)	Group Score	3.50	# 98
Visual Reasoning	Winoground	LXMERT	Text Score	19.25	# 101
Visual Reasoning	Winoground	LXMERT	Image Score	7.00	# 102
Visual Reasoning	Winoground	LXMERT	Group Score	4.00	# 93
Visual Reasoning	Winoground	UniT (ITM finetuned)	Text Score	19.50	# 100
Visual Reasoning	Winoground	UniT (ITM finetuned)	Image Score	6.25	# 104
Visual Reasoning	Winoground	UniT (ITM finetuned)	Group Score	4.00	# 93
Visual Reasoning	Winoground	VSE++ (Flickr30k, VGG)	Text Score	19.75	# 99
Visual Reasoning	Winoground	VSE++ (Flickr30k, VGG)	Image Score	6.25	# 104
Visual Reasoning	Winoground	VSE++ (Flickr30k, VGG)	Group Score	4.50	# 91
Visual Reasoning	Winoground	VSE++ (Flickr30k, ResNet)	Text Score	20.00	# 97
Visual Reasoning	Winoground	VSE++ (Flickr30k, ResNet)	Image Score	5.00	# 108
Visual Reasoning	Winoground	VSE++ (Flickr30k, ResNet)	Group Score	2.75	# 101
Visual Reasoning	Winoground	VSE++ (COCO, ResNet)	Text Score	22.75	# 92
Visual Reasoning	Winoground	VSE++ (COCO, ResNet)	Image Score	8.00	# 96
Visual Reasoning	Winoground	VSE++ (COCO, ResNet)	Group Score	4.00	# 93
Visual Reasoning	Winoground	ViLBERT base	Text Score	23.75	# 89
Visual Reasoning	Winoground	ViLBERT base	Image Score	7.25	# 100
Visual Reasoning	Winoground	ViLBERT base	Group Score	4.75	# 90
Visual Reasoning	Winoground	FLAVA (contrastive)	Text Score	25.25	# 84
Visual Reasoning	Winoground	FLAVA (contrastive)	Image Score	13.50	# 78
Visual Reasoning	Winoground	FLAVA (contrastive)	Group Score	9.00	# 75
Visual Reasoning	Winoground	ViLLA base	Text Score	30.00	# 68
Visual Reasoning	Winoground	ViLLA base	Image Score	12.00	# 84
Visual Reasoning	Winoground	ViLLA base	Group Score	8.00	# 80
Visual Reasoning	Winoground	CLIP (ViT-B/32)	Text Score	30.75	# 62
Visual Reasoning	Winoground	CLIP (ViT-B/32)	Image Score	10.50	# 91
Visual Reasoning	Winoground	CLIP (ViT-B/32)	Group Score	8.00	# 80
Visual Reasoning	Winoground	UNITER base	Text Score	32.25	# 58
Visual Reasoning	Winoground	UNITER base	Image Score	13.25	# 80
Visual Reasoning	Winoground	UNITER base	Group Score	10.00	# 69
Visual Reasoning	Winoground	FLAVA (ITM)	Text Score	32.25	# 58
Visual Reasoning	Winoground	FLAVA (ITM)	Image Score	20.50	# 50
Visual Reasoning	Winoground	FLAVA (ITM)	Group Score	14.25	# 48
Visual Reasoning	Winoground	ViLT (ViT-B/32)	Text Score	34.75	# 52
Visual Reasoning	Winoground	ViLT (ViT-B/32)	Image Score	14.00	# 73
Visual Reasoning	Winoground	ViLT (ViT-B/32)	Group Score	9.25	# 74
Visual Reasoning	Winoground	ViLLA large	Text Score	37.00	# 44
Visual Reasoning	Winoground	ViLLA large	Image Score	13.25	# 80
Visual Reasoning	Winoground	ViLLA large	Group Score	11.00	# 64
Visual Reasoning	Winoground	VinVL	Text Score	37.75	# 43
Visual Reasoning	Winoground	VinVL	Image Score	17.75	# 58
Visual Reasoning	Winoground	VinVL	Group Score	14.50	# 46
Visual Reasoning	Winoground	UNITER large	Text Score	38.00	# 41
Visual Reasoning	Winoground	UNITER large	Image Score	14.00	# 73
Visual Reasoning	Winoground	UNITER large	Group Score	10.50	# 66

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/winoground-probing-vision-and-language-models/visual-reasoning-on-winoground)](https://paperswithcode.com/sota/visual-reasoning-on-winoground?p=winoground-probing-vision-and-language-models)`

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

CVPR 2022 · Tristan Thrush, Ryan Jiang, Max Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace Ross ·

We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two images and two captions, the goal is to match them correctly - but crucially, both captions contain a completely identical set of words, only in a different order. The dataset was carefully hand-curated by expert annotators and is labeled with a rich set of fine-grained tags to assist in analyzing model performance. We probe a diverse range of state-of-the-art vision and language models and find that, surprisingly, none of them do much better than chance. Evidently, these models are not as skilled at visio-linguistic compositional reasoning as we might have hoped. We perform an extensive analysis to obtain insights into how future work might try to mitigate these models' shortcomings. We aim for Winoground to serve as a useful evaluation set for advancing the state of the art and driving further progress in the field. The dataset is available at https://huggingface.co/datasets/facebook/winoground.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

wangt-cn/eqben

120

juletx/spatial-reasoning

Tasks

Add Remove

Visual Reasoning

Datasets

Introduced in the Paper:

Winoground

Used in the Paper:

MS COCO

SST SST-2

Visual Genome

Flickr30k

WSC

YFCC100M

WIT

Results from the Paper

Edit

Ranked #41 on Visual Reasoning on Winoground

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Reasoning	Winoground	Random chance	Text Score	25.00	# 85	Compare
			Image Score	25.00	# 34	Compare
			Group Score	16.67	# 40	Compare
Visual Reasoning	Winoground	VSRN (Flickr30k)	Text Score	20.00	# 97	Compare
			Image Score	5.00	# 108	Compare
			Group Score	3.50	# 98	Compare
Visual Reasoning	Winoground	VisualBERT base	Text Score	15.50	# 109	Compare
			Image Score	2.50	# 110	Compare
			Group Score	1.50	# 104	Compare
Visual Reasoning	Winoground	VSRN (COCO)	Text Score	17.50	# 105	Compare
			Image Score	7.00	# 102	Compare
			Group Score	3.75	# 97	Compare
Visual Reasoning	Winoground	VSE++ (COCO, VGG)	Text Score	18.75	# 103	Compare
			Image Score	5.50	# 106	Compare
			Group Score	3.50	# 98	Compare
Visual Reasoning	Winoground	LXMERT	Text Score	19.25	# 101	Compare
			Image Score	7.00	# 102	Compare
			Group Score	4.00	# 93	Compare
Visual Reasoning	Winoground	UniT (ITM finetuned)	Text Score	19.50	# 100	Compare
			Image Score	6.25	# 104	Compare
			Group Score	4.00	# 93	Compare
Visual Reasoning	Winoground	VSE++ (Flickr30k, VGG)	Text Score	19.75	# 99	Compare
			Image Score	6.25	# 104	Compare
			Group Score	4.50	# 91	Compare
Visual Reasoning	Winoground	VSE++ (Flickr30k, ResNet)	Text Score	20.00	# 97	Compare
			Image Score	5.00	# 108	Compare
			Group Score	2.75	# 101	Compare
Visual Reasoning	Winoground	VSE++ (COCO, ResNet)	Text Score	22.75	# 92	Compare
			Image Score	8.00	# 96	Compare
			Group Score	4.00	# 93	Compare
Visual Reasoning	Winoground	ViLBERT base	Text Score	23.75	# 89	Compare
			Image Score	7.25	# 100	Compare
			Group Score	4.75	# 90	Compare
Visual Reasoning	Winoground	FLAVA (contrastive)	Text Score	25.25	# 84	Compare
			Image Score	13.50	# 78	Compare
			Group Score	9.00	# 75	Compare
Visual Reasoning	Winoground	ViLLA base	Text Score	30.00	# 68	Compare
			Image Score	12.00	# 84	Compare
			Group Score	8.00	# 80	Compare
Visual Reasoning	Winoground	CLIP (ViT-B/32)	Text Score	30.75	# 62	Compare
			Image Score	10.50	# 91	Compare
			Group Score	8.00	# 80	Compare
Visual Reasoning	Winoground	UNITER base	Text Score	32.25	# 58	Compare
			Image Score	13.25	# 80	Compare
			Group Score	10.00	# 69	Compare
Visual Reasoning	Winoground	FLAVA (ITM)	Text Score	32.25	# 58	Compare
			Image Score	20.50	# 50	Compare
			Group Score	14.25	# 48	Compare
Visual Reasoning	Winoground	ViLT (ViT-B/32)	Text Score	34.75	# 52	Compare
			Image Score	14.00	# 73	Compare
			Group Score	9.25	# 74	Compare
Visual Reasoning	Winoground	ViLLA large	Text Score	37.00	# 44	Compare
			Image Score	13.25	# 80	Compare
			Group Score	11.00	# 64	Compare
Visual Reasoning	Winoground	VinVL	Text Score	37.75	# 43	Compare
			Image Score	17.75	# 58	Compare
			Group Score	14.50	# 46	Compare
Visual Reasoning	Winoground	UNITER large	Text Score	38.00	# 41	Compare
			Image Score	14.00	# 73	Compare
			Group Score	10.50	# 66	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove