Referring Expression Segmentation

14 papers with code • 14 benchmarks • 10 datasets

The task aims at labelling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an indivisual object in a discourse or scene (the referent). REs unambiguosly identify the target instace.

Greatest papers with code

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

ashkamath/mdetr 26 Apr 2021

We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.

Phrase Grounding Question Answering +4

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

luogen1996/MCN CVPR 2020

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Referring Expression Comprehension Referring Expression Segmentation

PhraseCut: Language-based Image Segmentation in the Wild

ChenyunWu/PhraseCutDataset CVPR 2020

We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77, 262 images and 345, 486 phrase-region pairs.

Referring Expression Segmentation Semantic Segmentation

Cross-Modal Progressive Comprehension for Referring Segmentation

spyflying/CMPC-Refseg 15 May 2021

In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models.

Referring Expression Segmentation Semantic Segmentation +2

Referring Image Segmentation via Cross-Modal Progressive Comprehension

spyflying/CMPC-Refseg CVPR 2020

In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.

Referring Expression Segmentation Semantic Segmentation

Referring Expression Object Segmentation with Caption-Aware Consistency

wenz116/lang2seg 10 Oct 2019

To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.

Referring Expression Segmentation Semantic Segmentation

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

miriambellver/refvos 1 Oct 2020

The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers.

Referring Expression Segmentation Video Object Segmentation