Referring Video Object Segmentation

29 papers with code • 2 benchmarks • 2 datasets

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Referring Video Object Segmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	Refer-YouTube-VOS	GLEE-Pro			See all
	MeViS	DsHmp			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

End-to-End Referring Video Object Segmentation with Multimodal Transformers

mttr2021/MTTR • • CVPR 2022

Due to the complex nature of this multimodal task, which combines text reasoning, video understanding, instance segmentation and tracking, existing approaches typically rely on sophisticated pipelines in order to tackle it.

Paper
Code

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

foundationvision/uniref • • 25 Dec 2023

We evaluate our unified models on various benchmarks.

Paper
Code

Cross-Modal Self-Attention Network for Referring Image Segmentation

lwye/CMSA-Net • • CVPR 2019

This module controls the information flow of features at different levels.

Paper
Code

Language as Queries for Referring Video Object Segmentation

wjn922/referformer • • CVPR 2022

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.

Paper
Code

Local-Global Context Aware Transformer for Language-Guided Video Segmentation

leonnnop/locater • • 18 Mar 2022

We explore the task of language-guided video segmentation (LVS).

Paper
Code

Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

dzh19990407/lbdt • • CVPR 2022

Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos.

Paper
Code

Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus

lxa9867/R2VOS • • 4 Jul 2022

Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in a video based on a linguistic expression.

Paper
Code

Multi-Attention Network for Compressed Video Referring Object Segmentation

dexianghong/manet • • 26 Jul 2022

To address this problem, we propose a multi-attention network which consists of dual-path dual-attention module and a query-based cross-modal Transformer module.

Paper
Code

VLT: Vision-Language Transformer and Query Generation for Referring Segmentation

henghuiding/Vision-Language-Transformer • • 28 Oct 2022

We propose a Vision-Language Transformer (VLT) framework for referring segmentation to facilitate deep interactions among multi-modal information and enhance the holistic understanding to vision-language features.

Paper
Code

1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

zhiweihhh/cvpr2022-rvos-challenge • • 27 Dec 2022

The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer.

Paper
Code

Referring Video Object Segmentation

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result