Video Instance Segmentation
60 papers with code • 8 benchmarks • 8 datasets
The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain.
To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built.
These leaderboards are used to track progress in Video Instance Segmentation
LibrariesUse these libraries to find Video Instance Segmentation models and implementations
Most implemented papers
Simple Online and Realtime Tracking with a Deep Association Metric
Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms.
Video Instance Segmentation
The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.
Instances as Queries
The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.
Mask2Former for Video Instance Segmentation
We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
End-to-End Video Instance Segmentation with Transformers
Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.
Temporally Efficient Vision Transformer for Video Instance Segmentation
To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
UVO Challenge on Video-based Open-World Segmentation 2021: 1st Place Solution
In this report, we introduce our (pretty straightforard) two-step "detect-then-match" video instance segmentation method.
SeqFormer: Sequential Transformer for Video Instance Segmentation
Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.
RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation
Given an input image or video, our framework first conducts multi-label classification over the complete label, then sorts the complete label and selects a small subset according to their class confidence scores.