Video Panoptic Segmentation

17 papers with code • 3 benchmarks • 4 datasets

Video Panoptic Segmentation is a computer vision task that extends panoptic segmentation by incorporating temporal dimension. That is, given a video sequence, the goal is to predict the semantic class of each pixel while consistently tracking object instances. Here, the pixels belonging to the same object instance should be assigned the same instance ID throughout the video sequence.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Panoptic Segmentation

Dataset	Best Model	Compare
VIPSeg	DVIS++(VIT-L)	See all
Cityscapes-VPS	VIP-Deeplab	See all
KITTI-STEP	Video K-Net (Swin-L)	See all

Libraries

Use these libraries to find Video Panoptic Segmentation models and implementations

google-research/deeplab2

2 papers

989

zhang-tao-whu/DVIS

2 papers

116

Datasets

Most implemented papers

Most implemented Social Latest No code

TarViS: A Unified Approach for Target-based Video Segmentation

Ali2500/TarViS • • CVPR 2023

A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining.

Paper
Code

DVIS: Decoupled Video Instance Segmentation Framework

zhang-tao-whu/DVIS • • ICCV 2023

The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame association during tracking, and 2) the effective utilization of temporal information predicated on the aforementioned accurate alignment outcomes during refinement.

Paper
Code

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

zhang-tao-whu/DVIS • • 7 Jun 2023

In this report, we successfully validated the effectiveness of the decoupling strategy in video panoptic segmentation.

Paper
Code

Tracking Anything with Decoupled Video Segmentation

hkchengrex/Tracking-Anything-with-DEVA • • ICCV 2023

To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation.

Paper
Code

MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

tacju/maxtron • • 30 Nov 2023

To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively.

Paper
Code

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

zhang-tao-whu/DVIS_Plus • • 20 Dec 2023

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS).

Paper
Code

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

minghanli/univs • • 28 Feb 2024

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.

Paper
Code

Video Panoptic Segmentation

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

TarViS: A Unified Approach for Target-based Video Segmentation

DVIS: Decoupled Video Instance Segmentation Framework

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

Tracking Anything with Decoupled Video Segmentation

MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Content

Benchmarks

Add a Result