Referring Expression

117 papers with code • 1 benchmarks • 3 datasets

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Benchmarks

Add a Result

These leaderboards are used to track progress in Referring Expression

Trend	Dataset	Best Model	Paper	Code	Compare
	SQA3D	Random			See all

Libraries

Use these libraries to find Referring Expression models and implementations

huggingface/transformers

2 papers

126,503

Datasets

Most implemented papers

Most implemented Social Latest No code

Kosmos-2: Grounding Multimodal Large Language Models to the World

microsoft/unilm • • 26 Jun 2023

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Paper
Code

Described Object Detection: Liberating Object Detection with Flexible Expressions

charles-xie/awesome-described-object-detection • • NeurIPS 2023

In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object.

Paper
Code

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

sy-xuan/pink • • 1 Oct 2023

Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.

Paper
Code

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

jamespark3922/localized-skd • • NeurIPS 2023

Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.

Paper
Code

Generation and Comprehension of Unambiguous Object Descriptions

mjhucla/Google_Refexp_toolbox • CVPR 2016

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.

Paper
Code

Reasoning About Pragmatics with Neural Listeners and Speakers

jacobandreas/pragma • EMNLP 2016

We present a model for pragmatically describing scenes, in which contrastive behavior results from a combination of inference-driven pragmatics and learned semantics.

Paper
Code

Modeling Context Between Objects for Referring Expression Understanding

varun-nagaraja/referring-expressions • 1 Aug 2016

Our approach uses an LSTM to learn the probability of a referring expression, with input features from a region and a context region.

Paper
Code

Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding

futurulus/colors-in-context • TACL 2017

We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework.

Paper
Code

Grounding Referring Expressions in Images by Variational Context

yuleiniu/vc • • CVPR 2018

This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.

Paper
Code

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet • • CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Paper
Code

Referring Expression

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result