TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Instance Segmentation	OVIS validation	Mask2Former-VIS	mask AP	16.6	# 37
Video Instance Segmentation	OVIS validation	Mask2Former-VIS	AP50	36.9	# 32
Video Instance Segmentation	OVIS validation	Mask2Former-VIS	AP75	14.1	# 36
Video Instance Segmentation	OVIS validation	Mask2Former-VIS	AR1	9.9	# 27
Video Instance Segmentation	OVIS validation	Mask2Former-VIS	AR10	24.7	# 27
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (Swin-L)	mask AP	60.4	# 14
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (Swin-L)	AP50	84.4	# 12
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (Swin-L)	AP75	67.0	# 13
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-50)	mask AP	46.4	# 28
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-50)	AP50	68.0	# 27
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-50)	AP75	50.0	# 27
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-101)	mask AP	49.2	# 24
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-101)	AP50	72.8	# 22
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-101)	AP75	54.2	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask2former-for-video-instance-segmentation/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=mask2former-for-video-instance-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask2former-for-video-instance-segmentation/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=mask2former-for-video-instance-segmentation)`

Mask2Former for Video Instance Segmentation

20 Dec 2021 · Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing ·

We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouTubeVIS-2019 and 52.6 AP on YouTubeVIS-2021. We believe Mask2Former is also capable of handling video semantic and panoptic segmentation, given its versatility in image segmentation. We hope this will make state-of-the-art video segmentation research more accessible and bring more attention to designing universal image and video segmentation architectures.

PDF Abstract

Code

Add Remove Mark official

facebookresearch/Mask2Former official

↳ Quickstart in

Colab

Spaces

Replicate

2,200

huggingface/transformers

124,793

open-mmlab/mmdetection

27,744

alibaba/EasyCV

1,676

nihalsid/mask2former

↳ Quickstart in

Colab

Spaces

Replicate

Tasks

Add Remove

Image Segmentation

Instance Segmentation

Panoptic Segmentation

Segmentation

Semantic Segmentation

Video Instance Segmentation

Video Segmentation

Video Semantic Segmentation

Datasets

YouTube-VIS 2019

OVIS

Results from the Paper

Add Remove

Ranked #14 on Video Instance Segmentation on YouTube-VIS validation

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Instance Segmentation	OVIS validation	Mask2Former-VIS	mask AP	16.6	# 37	Compare
			AP50	36.9	# 32	Compare
			AP75	14.1	# 36	Compare
			AR1	9.9	# 27	Compare
			AR10	24.7	# 27	Compare
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (Swin-L)	mask AP	60.4	# 14	Compare
			AP50	84.4	# 12	Compare
			AP75	67.0	# 13	Compare
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-50)	mask AP	46.4	# 28	Compare
			AP50	68.0	# 27	Compare
			AP75	50.0	# 27	Compare
Video Instance Segmentation	YouTube-VIS validation	Mask2Former (ResNet-101)	mask AP	49.2	# 24	Compare
			AP50	72.8	# 22	Compare
			AP75	54.2	# 23	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Mask2Former for Video Instance Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove