Video Instance Segmentation
43 papers with code • 4 benchmarks • 4 datasets
The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain.
To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built.
Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms.
Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).
Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame.
Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.