Referring Expression

79 papers with code • 0 benchmarks • 2 datasets

Referring expressions places a bounding box around the instance corresponding to the provided description and image.


Use these libraries to find Referring Expression models and implementations
2 papers

Most implemented papers

UNITER: UNiversal Image-TExt Representation Learning

ChenRocks/UNITER ECCV 2020

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Modeling Context in Referring Expressions

lichengunc/refer 31 Jul 2016

Humans refer to objects in their environments all the time, especially in dialogue with other people.

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

ruotianluo/iep-ref CVPR 2019

Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process.

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

jackroos/VL-BERT ICLR 2020

We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short).

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

lichengunc/speaker_listener_reinforcer CVPR 2017

The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.

Generating Easy-to-Understand Referring Expressions for Target Identifications

mikittt/easy-to-understand-REG ICCV 2019

Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans.

A Fast and Accurate One-Stage Approach to Visual Grounding

zyang-ur/onestage_grounding ICCV 2019

We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

zhegan27/VILLA NeurIPS 2020

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

ashkamath/mdetr 26 Apr 2021

We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.

Image Segmentation Using Text and Image Prompts

timojl/clipseg CVPR 2022

After training on an extended version of the PhraseCut dataset, our system generates a binary segmentation map for an image based on a free-text prompt or on an additional image expressing the query.