Video Instance Segmentation

93 papers with code • 8 benchmarks • 8 datasets

The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain.

To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built.

Libraries

Use these libraries to find Video Instance Segmentation models and implementations
3 papers
409
2 papers
30,996
See all 7 libraries.

Most implemented papers

Simple Online and Realtime Tracking with a Deep Association Metric

nwojke/deep_sort 21 Mar 2017

Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms.

Video Instance Segmentation

Epiphqny/VisTR ICCV 2019

The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.

Mask2Former for Video Instance Segmentation

facebookresearch/Mask2Former 20 Dec 2021

We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.

Instances as Queries

hustvl/QueryInst ICCV 2021

The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.

Temporally Efficient Vision Transformer for Video Instance Segmentation

hustvl/tevit CVPR 2022

To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

skyworkai/daq-vs 29 Mar 2024

Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion.

End-to-End Video Instance Segmentation with Transformers

Epiphqny/VisTR CVPR 2021

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Occluded Video Instance Segmentation: A Benchmark

haochenheheda/lvvis 2 Feb 2021

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

wvangansbeke/Revisiting-Contrastive-SSL NeurIPS 2021

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.

UVO Challenge on Video-based Open-World Segmentation 2021: 1st Place Solution

dulucas/uvo_challenge 22 Oct 2021

In this report, we introduce our (pretty straightforard) two-step "detect-then-match" video instance segmentation method.