Referring Expression Comprehension
67 papers with code • 8 benchmarks • 8 datasets
Libraries
Use these libraries to find Referring Expression Comprehension models and implementationsDatasets
Latest papers with no code
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
A communication token is generated by the LLM following a visual entity or a relation, to inform the detection network to propose regions that are relevant to the sentence generated so far.
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language.
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks
The results show that our method outperforms the baseline method in terms of language comprehension accuracy.
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
They can achieve exceptional performances on specific tasks, but face a particularly challenging problem of modality mismatch because of diversity of input modalities and their fixed structures.
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving
In this work, we propose a new multi-modal visual grounding task, termed LiDAR Grounding.
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
Different functional modules in the programs are implemented as neural networks.
Dynamic Inference With Grounding Based Vision and Language Models
For example, recent image and language models with more than 200M parameters have been proposed to learn visual grounding in the pre-training step and show impressive results on downstream vision and language tasks.
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models.
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
In this paper, we present the first attempt of semi-supervised learning for REC and propose a strong baseline method called RefTeacher.
One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning
However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions.