Referring Expression

117 papers with code • 1 benchmarks • 3 datasets

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Libraries

Use these libraries to find Referring Expression models and implementations

Most implemented papers

Kosmos-2: Grounding Multimodal Large Language Models to the World

microsoft/unilm 26 Jun 2023

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Described Object Detection: Liberating Object Detection with Flexible Expressions

charles-xie/awesome-described-object-detection NeurIPS 2023

In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object.

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

sy-xuan/pink 1 Oct 2023

Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

jamespark3922/localized-skd NeurIPS 2023

Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.

Generation and Comprehension of Unambiguous Object Descriptions

mjhucla/Google_Refexp_toolbox CVPR 2016

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.

Reasoning About Pragmatics with Neural Listeners and Speakers

jacobandreas/pragma EMNLP 2016

We present a model for pragmatically describing scenes, in which contrastive behavior results from a combination of inference-driven pragmatics and learned semantics.

Modeling Context Between Objects for Referring Expression Understanding

varun-nagaraja/referring-expressions 1 Aug 2016

Our approach uses an LSTM to learn the probability of a referring expression, with input features from a region and a context region.

Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding

futurulus/colors-in-context TACL 2017

We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework.

Grounding Referring Expressions in Images by Variational Context

yuleiniu/vc CVPR 2018

This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.