Referring Expression
116 papers with code • 1 benchmarks • 3 datasets
Referring expressions places a bounding box around the instance corresponding to the provided description and image.
Libraries
Use these libraries to find Referring Expression models and implementationsMost implemented papers
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.
Generating Easy-to-Understand Referring Expressions for Target Identifications
Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans.
A Fast and Accurate One-Stage Approach to Visual Grounding
We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.
Unifying Vision-and-Language Tasks via Text Generation
On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.
The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts
We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts.
GRES: Generalized Referring Expression Segmentation
Existing classic RES datasets and methods commonly support single-target expressions only, i. e., one expression refers to one target object.