TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Offline)	mask AP	49.9	# 3
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Offline)	AP50	75.9	# 2
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Offline)	AP75	53.0	# 4
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Offline)	AR1	19.4	# 3
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Offline)	AR10	55.3	# 2
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Online)	mask AP	47.1	# 6
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Online)	AP50	71.9	# 5
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Online)	AP75	49.2	# 6
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Online)	AR1	19.4	# 3
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Online)	AR10	52.5	# 4
Video Panoptic Segmentation	VIPSeg	DVIS(Swin-L)	VPQ	57.6	# 3
Video Panoptic Segmentation	VIPSeg	DVIS(Swin-L)	STQ	55.3	# 3
Video Instance Segmentation	YouTube-VIS 2021	DVIS(Swin-L)	mask AP	60.1	# 6
Video Instance Segmentation	YouTube-VIS 2021	DVIS(Swin-L)	AP50	83.0	# 3
Video Instance Segmentation	YouTube-VIS 2021	DVIS(Swin-L)	AP75	68.4	# 4
Video Instance Segmentation	YouTube-VIS 2021	DVIS(Swin-L)	AR10	65.7	# 3
Video Instance Segmentation	YouTube-VIS 2021	DVIS(Swin-L)	AR1	47.7	# 7
Video Instance Segmentation	Youtube-VIS 2022 Validation	DVIS(Swin-L)	mAP_L	45.9	# 3
Video Instance Segmentation	Youtube-VIS 2022 Validation	DVIS(Swin-L)	AP50_L	69.0	# 2
Video Instance Segmentation	Youtube-VIS 2022 Validation	DVIS(Swin-L)	AP75_L	48.8	# 2
Video Instance Segmentation	Youtube-VIS 2022 Validation	DVIS(Swin-L)	AR1_L	37.2	# 2
Video Instance Segmentation	Youtube-VIS 2022 Validation	DVIS(Swin-L)	AR10_L	51.8	# 2
Video Instance Segmentation	YouTube-VIS validation	DVIS(Swin-L)	mask AP	64.9	# 6
Video Instance Segmentation	YouTube-VIS validation	DVIS(Swin-L)	AP50	88.0	# 4
Video Instance Segmentation	YouTube-VIS validation	DVIS(Swin-L)	AP75	72.7	# 4
Video Instance Segmentation	YouTube-VIS validation	DVIS(Swin-L)	AR1	56.5	# 3
Video Instance Segmentation	YouTube-VIS validation	DVIS(Swin-L)	AR10	70.3	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dvis-decoupled-video-instance-segmentation/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=dvis-decoupled-video-instance-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dvis-decoupled-video-instance-segmentation/video-panoptic-segmentation-on-vipseg)](https://paperswithcode.com/sota/video-panoptic-segmentation-on-vipseg?p=dvis-decoupled-video-instance-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dvis-decoupled-video-instance-segmentation/video-instance-segmentation-on-youtube-vis-3)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-3?p=dvis-decoupled-video-instance-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dvis-decoupled-video-instance-segmentation/video-instance-segmentation-on-youtube-vis-2)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=dvis-decoupled-video-instance-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dvis-decoupled-video-instance-segmentation/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=dvis-decoupled-video-instance-segmentation)`

DVIS: Decoupled Video Instance Segmentation Framework

ICCV 2023 · Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang, Pengfei Wan ·

Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing. Existing methods often underperform on complex and long videos in real world, primarily due to two factors. Firstly, offline methods are limited by the tightly-coupled modeling paradigm, which treats all frames equally and disregards the interdependencies between adjacent frames. Consequently, this leads to the introduction of excessive noise during long-term temporal alignment. Secondly, online methods suffer from inadequate utilization of temporal information. To tackle these challenges, we propose a decoupling strategy for VIS by dividing it into three independent sub-tasks: segmentation, tracking, and refinement. The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame association during tracking, and 2) the effective utilization of temporal information predicated on the aforementioned accurate alignment outcomes during refinement. We introduce a novel referring tracker and temporal refiner to construct the \textbf{D}ecoupled \textbf{VIS} framework (\textbf{DVIS}). DVIS achieves new SOTA performance in both VIS and VPS, surpassing the current SOTA methods by 7.3 AP and 9.6 VPQ on the OVIS and VIPSeg datasets, which are the most challenging and realistic benchmarks. Moreover, thanks to the decoupling strategy, the referring tracker and temporal refiner are super light-weight (only 1.69\% of the segmenter FLOPs), allowing for efficient training and inference on a single GPU with 11G memory. The code is available at \href{https://github.com/zhang-tao-whu/DVIS}{https://github.com/zhang-tao-whu/DVIS}.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

zhang-tao-whu/DVIS official

116

Tasks

Add Remove

Autonomous Driving

Instance Segmentation

Segmentation

Semantic Segmentation

Video Editing

Video Instance Segmentation

Video Panoptic Segmentation

Datasets

YouTube-VIS 2019

OVIS YouTube-VIS 2021 VIPSeg

Youtube-VIS 2022 Validation

Results from the Paper

Edit

Ranked #3 on Video Panoptic Segmentation on VIPSeg

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Offline)	mask AP	49.9	# 3	Compare
			AP50	75.9	# 2	Compare
			AP75	53.0	# 4	Compare
			AR1	19.4	# 3	Compare
			AR10	55.3	# 2	Compare
Video Instance Segmentation	OVIS validation	DVIS(Swin-L, Online)	mask AP	47.1	# 6	Compare
			AP50	71.9	# 5	Compare
			AP75	49.2	# 6	Compare
			AR1	19.4	# 3	Compare
			AR10	52.5	# 4	Compare
Video Panoptic Segmentation	VIPSeg	DVIS(Swin-L)	VPQ	57.6	# 3	Compare
Video Panoptic Segmentation	VIPSeg	DVIS(Swin-L)	STQ	55.3	# 3	Compare
Video Instance Segmentation	YouTube-VIS 2021	DVIS(Swin-L)	mask AP	60.1	# 6	Compare
			AP50	83.0	# 3	Compare
			AP75	68.4	# 4	Compare
			AR10	65.7	# 3	Compare
			AR1	47.7	# 7	Compare
Video Instance Segmentation	Youtube-VIS 2022 Validation	DVIS(Swin-L)	mAP_L	45.9	# 3	Compare
			AP50_L	69.0	# 2	Compare
			AP75_L	48.8	# 2	Compare
			AR1_L	37.2	# 2	Compare
			AR10_L	51.8	# 2	Compare
Video Instance Segmentation	YouTube-VIS validation	DVIS(Swin-L)	mask AP	64.9	# 6	Compare
			AP50	88.0	# 4	Compare
			AP75	72.7	# 4	Compare
			AR1	56.5	# 3	Compare
			AR10	70.3	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

DVIS: Decoupled Video Instance Segmentation Framework

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove