Referring Expression Comprehension

68 papers with code • 8 benchmarks • 8 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Referring Expression Comprehension models and implementations

Most implemented papers

Natural Language Object Retrieval

ronghanghu/natural-language-object-retrieval CVPR 2016

In this paper, we address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object.

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Explainable Neural Computation via Stack Neural Module Networks

ronghanghu/snmn ECCV 2018

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.

Language-Conditioned Graph Networks for Relational Reasoning

ronghanghu/lcgn ICCV 2019

E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.

Talk2Car: Taking Control of Your Self-Driving Car

talk2car/Talk2Car IJCNLP 2019

Or more specifically, we consider the problem in an autonomous driving setting, where a passenger requests an action that can be associated with an object found in a street scene.

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

luogen1996/Real-time-Global-Inference-Network 7 Dec 2019

Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description.

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge

zhanyang-nwpu/rsvg-pytorch 2 Jun 2020

In this case, we need to use commonsense knowledge to identify the objects in the image.

AttnGrounder: Talking to Cars with Attention

i-m-vivek/AttnGrounder 11 Sep 2020

Visual grounding aims to localize a specific object in an image based on a given natural language text query.

Cosine meets Softmax: A tough-to-beat baseline for visual grounding

niveditarufus/CMSVG 13 Sep 2020

In this paper, we present a simple baseline for visual grounding for autonomous driving which outperforms the state of the art methods, while retaining minimal design choices.

Language-Conditioned Feature Pyramids for Visual Selection Tasks

Alab-NII/lcfp Findings of the Association for Computational Linguistics 2020

However, few models consider the fusion of linguistic features with multiple visual features with different sizes of receptive fields, though the proper size of the receptive field of visual features intuitively varies depending on expressions.