Referring Expression

116 papers with code • 1 benchmarks • 3 datasets

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Libraries

Use these libraries to find Referring Expression models and implementations

Most implemented papers

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

lichengunc/speaker_listener_reinforcer CVPR 2017

The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.

Generating Easy-to-Understand Referring Expressions for Target Identifications

mikittt/easy-to-understand-REG ICCV 2019

Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans.

A Fast and Accurate One-Stage Approach to Visual Grounding

zyang-ur/onestage_grounding ICCV 2019

We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

luogen1996/MCN CVPR 2020

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

zhegan27/VILLA NeurIPS 2020

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

Unifying Vision-and-Language Tasks via Text Generation

j-min/VL-T5 4 Feb 2021

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

Airbert: In-domain Pretraining for Vision-and-Language Navigation

airbert-vln/airbert ICCV 2021

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

allenai/reclip ACL 2022

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.

The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts

priya22/pdnc-lrec2022 LREC 2022

We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts.

GRES: Generalized Referring Expression Segmentation

henghuiding/ReLA CVPR 2023

Existing classic RES datasets and methods commonly support single-target expressions only, i. e., one expression refers to one target object.