TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Panoptic Segmentation	Cityscapes-VPS	Video K-Net (Swin-B)	VPQ	62.2	# 3
Video Panoptic Segmentation	Cityscapes-VPS	Video K-Net (Swin-B)	VPQ (thing)	49.8	# 1
Video Panoptic Segmentation	Cityscapes-VPS	Video K-Net (Swin-B)	VPQ (stuff)	71.8	# 2
Video Panoptic Segmentation	KITTI-STEP	Video K-Net (Swin-L)	STQ	74.0	# 1
Video Panoptic Segmentation	KITTI-STEP	Video K-Net (Swin-L)	AQ	73.0	# 1
Video Panoptic Segmentation	KITTI-STEP	Video K-Net (Swin-L)	SQ	75.0	# 1
Video Instance Segmentation	YouTube-VIS validation	Video K-Net (Swin-Base)	mask AP	54.1	# 21
Video Instance Segmentation	YouTube-VIS validation	Video K-Net (Swin-Base)	AP50	79.0	# 17
Video Instance Segmentation	YouTube-VIS validation	Video K-Net (Swin-Base)	AP75	59.6	# 19
Video Instance Segmentation	YouTube-VIS validation	Video K-Net (Swin-Base)	AR1	49.7	# 17
Video Instance Segmentation	YouTube-VIS validation	Video K-Net (Swin-Base)	AR10	59.9	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-k-net-a-simple-strong-and-unified/video-panoptic-segmentation-on-kitti-step)](https://paperswithcode.com/sota/video-panoptic-segmentation-on-kitti-step?p=video-k-net-a-simple-strong-and-unified)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-k-net-a-simple-strong-and-unified/video-panoptic-segmentation-on-cityscapes-vps)](https://paperswithcode.com/sota/video-panoptic-segmentation-on-cityscapes-vps?p=video-k-net-a-simple-strong-and-unified)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-k-net-a-simple-strong-and-unified/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=video-k-net-a-simple-strong-and-unified)`

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

CVPR 2022 · Xiangtai Li, Wenwei Zhang, Jiangmiao Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy ·

This paper presents Video K-Net, a simple, strong, and unified framework for fully end-to-end video panoptic segmentation. The method is built upon K-Net, a method that unifies image segmentation via a group of learnable kernels. We observe that these learnable kernels from K-Net, which encode object appearances and contexts, can naturally associate identical instances across video frames. Motivated by this observation, Video K-Net learns to simultaneously segment and track "things" and "stuff" in a video with simple kernel-based appearance modeling and cross-temporal kernel interaction. Despite the simplicity, it achieves state-of-the-art video panoptic segmentation results on Citscapes-VPS, KITTI-STEP, and VIPSeg without bells and whistles. In particular, on KITTI-STEP, the simple method can boost almost 12\% relative improvements over previous methods. On VIPSeg, Video K-Net boosts almost 15\% relative improvements and results in 39.8 % VPQ. We also validate its generalization on video semantic segmentation, where we boost various baselines by 2\% on the VSPW dataset. Moreover, we extend K-Net into clip-level video framework for video instance segmentation, where we obtain 40.5% mAP for ResNet50 backbone and 54.1% mAP for Swin-base on YouTube-2019 validation set. We hope this simple, yet effective method can serve as a new, flexible baseline in unified video segmentation design. Both code and models are released at https://github.com/lxtGH/Video-K-Net.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

lxtgh/video-k-net official

150

Tasks

Add Remove

Image Segmentation

Instance Segmentation

Panoptic Segmentation

Segmentation

Semantic Segmentation

Video Instance Segmentation

Video Panoptic Segmentation

Video Segmentation

Video Semantic Segmentation

Datasets

Cityscapes

YouTube-VIS 2019 VSPW

Cityscapes-VPS VIPSeg

KITTI-STEP

Results from the Paper

Edit

Ranked #1 on Video Panoptic Segmentation on KITTI-STEP (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Panoptic Segmentation	Cityscapes-VPS	Video K-Net (Swin-B)	VPQ	62.2	# 3	Compare
			VPQ (thing)	49.8	# 1	Compare
			VPQ (stuff)	71.8	# 2	Compare
Video Panoptic Segmentation	KITTI-STEP	Video K-Net (Swin-L)	STQ	74.0	# 1	Compare
			AQ	73.0	# 1	Compare
			SQ	75.0	# 1	Compare
Video Instance Segmentation	YouTube-VIS validation	Video K-Net (Swin-Base)	mask AP	54.1	# 21	Compare
			AP50	79.0	# 17	Compare
			AP75	59.6	# 19	Compare
			AR1	49.7	# 17	Compare
			AR10	59.9	# 17	Compare

Methods

Add Remove

K-Net

Edit Social Preview

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove