Video Instance Segmentation

43 papers with code • 4 benchmarks • 4 datasets

The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain.

To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built.


Use these libraries to find Video Instance Segmentation models and implementations

Most implemented papers

Simple Online and Realtime Tracking with a Deep Association Metric

nwojke/deep_sort 21 Mar 2017

Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms.

Video Instance Segmentation

Epiphqny/VisTR ICCV 2019

The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.

Instances as Queries

hustvl/QueryInst ICCV 2021

The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.

End-to-End Video Instance Segmentation with Transformers

Epiphqny/VisTR CVPR 2021

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

wvangansbeke/Revisiting-Contrastive-SSL NeurIPS 2021

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.

Temporally Efficient Vision Transformer for Video Instance Segmentation

hustvl/tevit 18 Apr 2022

To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).

Efficient Video Object Segmentation via Network Modulation

linjieyangsc/video_seg CVPR 2018

Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame.

007: Democratically Finding The Cause of Packet Drops

behnazak/Vigil-007SourceCode 20 Feb 2018

Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.

Instance-wise Depth and Motion Learning from Monocular Videos

SeokjuLee/Insta-DM 19 Dec 2019

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.

Learning a Spatio-Temporal Embedding for Video Instance Segmentation

jdc08161063/spatio-temporal-embedding 19 Dec 2019

We present a novel embedding approach for video instance segmentation.