TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Referring Expression Segmentation	CLEVR-Ref+	IEP-Ref (700K prog.)	IoU	80.6	# 1
Referring Expression Comprehension	CLEVR-Ref+	MAttNet [34]	Accuracy	60.9	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clevr-ref-diagnosing-visual-reasoning-with/referring-expression-segmentation-on-clevr)](https://paperswithcode.com/sota/referring-expression-segmentation-on-clevr?p=clevr-ref-diagnosing-visual-reasoning-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clevr-ref-diagnosing-visual-reasoning-with/referring-expression-comprehension-on-clevr)](https://paperswithcode.com/sota/referring-expression-comprehension-on-clevr?p=clevr-ref-diagnosing-visual-reasoning-with)`

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

CVPR 2019 · Runtao Liu, Chenxi Liu, Yutong Bai, Alan Yuille ·

Referring object detection and referring image segmentation are important tasks that require joint understanding of visual information and natural language. Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process. To address these issues and complement similar efforts in visual question answering, we build CLEVR-Ref+, a synthetic diagnostic dataset for referring expression comprehension. The precise locations and attributes of the objects are readily available, and the referring expressions are automatically associated with functional programs. The synthetic nature allows control over dataset bias (through sampling strategy), and the modular programs enable intermediate reasoning ground truth without human annotators. In addition to evaluating several state-of-the-art models on CLEVR-Ref+, we also propose IEP-Ref, a module network approach that significantly outperforms other models on our dataset. In particular, we present two interesting and important findings using IEP-Ref: (1) the module trained to transform feature maps into segmentation masks can be attached to any intermediate module to reveal the entire reasoning process step-by-step; (2) even if all training data has at least one object referred, IEP-Ref can correctly predict no-foreground when presented with false-premise referring expressions. To the best of our knowledge, this is the first direct and quantitative proof that neural modules behave in the way they are intended.

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Code

Add Remove Mark official

ruotianluo/iep-ref

arjunakula/neurips2021

byahn2/clevr_ref

Tasks

Add Remove

Image Segmentation

object-detection

Object Detection

Question Answering

Referring Expression

Referring Expression Comprehension

Referring Expression Segmentation

Semantic Segmentation

Visual Question Answering

Visual Question Answering (VQA)

Visual Reasoning

Datasets

Introduced in the Paper:

CLEVR-Ref+

Used in the Paper:

Visual Question Answering

CLEVR

RefCOCO

Results from the Paper

Edit

Ranked #1 on Referring Expression Segmentation on CLEVR-Ref+

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Referring Expression Segmentation	CLEVR-Ref+	IEP-Ref (700K prog.)	IoU	80.6	# 1		Compare
Referring Expression Comprehension	CLEVR-Ref+	MAttNet [34]	Accuracy	60.9	# 4		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove