Referring Expression Segmentation

The task aims at labelling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an indivisual object in a discourse or scene (the referent). REs unambiguosly identify the target instace.

Most implemented papers

Segmentation from Natural Language Expressions

ssharpe42/NLQAC_ObjSeg 20 Mar 2016

To produce pixelwise segmentation for the language expression, we propose an end-to-end trainable recurrent and convolutional network model that jointly learns to process visual and linguistic information.

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

ruotianluo/iep-ref CVPR 2019

Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process.

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

miriambellver/refvos 1 Oct 2020

The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers.

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

ashkamath/mdetr 26 Apr 2021

We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

imatge-upc/synthref 8 Jun 2021

Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation.

End-to-End Referring Video Object Segmentation with Multimodal Transformers

mttr2021/MTTR CVPR 2022

Due to the complex nature of this multimodal task, which combines text reasoning, video understanding, instance segmentation and tracking, existing approaches typically rely on sophisticated pipelines in order to tackle it.

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

yz93/lavt-ris CVPR 2022

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image.

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Actor and Action Video Segmentation from a Sentence

JerryX1110/awesome-rvos CVPR 2018

This paper strives for pixel-level segmentation of actors and their actions in video content.

Referring Image Segmentation via Recurrent Refinement Networks

liruiyu/referseg_rrn CVPR 2018

We address the problem of image segmentation from natural language descriptions.