Referring Expression

116 papers with code • 1 benchmarks • 3 datasets

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Benchmarks

Add a Result

These leaderboards are used to track progress in Referring Expression

Trend	Dataset	Best Model	Paper	Code	Compare
	SQA3D	Random			See all

Libraries

Use these libraries to find Referring Expression models and implementations

huggingface/transformers

2 papers

125,725

Datasets

Most implemented papers

Most implemented Social Latest No code

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

lichengunc/speaker_listener_reinforcer • • CVPR 2017

The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.

Paper
Code

Generating Easy-to-Understand Referring Expressions for Target Identifications

mikittt/easy-to-understand-REG • ICCV 2019

Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans.

Paper
Code

A Fast and Accurate One-Stage Approach to Visual Grounding

zyang-ur/onestage_grounding • • ICCV 2019

We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.

Paper
Code

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

luogen1996/MCN • • CVPR 2020

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Paper
Code

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

zhegan27/VILLA • • NeurIPS 2020

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

Paper
Code

Unifying Vision-and-Language Tasks via Text Generation

j-min/VL-T5 • • 4 Feb 2021

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

Paper
Code

Airbert: In-domain Pretraining for Vision-and-Language Navigation

airbert-vln/airbert • • ICCV 2021

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

Paper
Code

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

allenai/reclip • • ACL 2022

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.

Paper
Code