Referring Expression Comprehension

67 papers with code • 8 benchmarks • 8 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Referring Expression Comprehension

Dataset	Best Model	Compare
RefCOCO	UNINEXT-H	See all
Talk2Car	Udeer_HuBo-VLM	See all
RefCoco+	ONE-PEACE	See all
RefCOCOg-val	ONE-PEACE	See all
RefCOCOg-test	UNINEXT-H	See all
CLEVR-Ref+	MDETR	See all
GRIT	Unified-IOXL	See all
VQDv1	Vision+Query	See all

Libraries

Use these libraries to find Referring Expression Comprehension models and implementations

modelscope/modelscope

2 papers

6,079

Datasets

Latest papers with no code

Most implemented Social Latest No code

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

no code yet • 6 Nov 2023

A communication token is generated by the LLM following a visual entity or a relation, to inform the detection network to propose regions that are relevant to the sentence generated so far.

Paper
Add Code

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

no code yet • 25 Oct 2023

Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language.

Paper
Add Code

Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks

no code yet • 14 Jul 2023

The results show that our method outperforms the baseline method in terms of language comprehension accuracy.

Paper
Add Code

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input

no code yet • 25 Jun 2023

They can achieve exceptional performances on specific tasks, but face a particularly challenging problem of modality mismatch because of diversity of input modalities and their fixed structures.

Paper
Add Code

Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

no code yet • 25 May 2023

In this work, we propose a new multi-modal visual grounding task, termed LiDAR Grounding.

Paper
Add Code

NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

no code yet • CVPR 2023

Different functional modules in the programs are implemented as neural networks.

Paper
Add Code

Dynamic Inference With Grounding Based Vision and Language Models

no code yet • CVPR 2023

For example, recent image and language models with more than 200M parameters have been proposed to learn visual grounding in the pre-training step and show impressive results on downstream vision and language tasks.

Paper
Add Code

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension

no code yet • CVPR 2023

Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models.

Paper
Add Code

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension

no code yet • CVPR 2023

In this paper, we present the first attempt of semi-supervised learning for REC and propose a strong baseline method called RefTeacher.

Paper
Add Code

One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

no code yet • 31 Jul 2022

However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions.

Paper
Add Code

Referring Expression Comprehension

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result