Video Segmentation
105 papers with code • 1 benchmarks • 9 datasets
Datasets
Subtasks
Most implemented papers
One-Shot Video Object Segmentation
This paper tackles the task of semi-supervised video object segmentation, i. e., the separation of an object from the background in a video, given the mask of the first frame.
Mask2Former for Video Instance Segmentation
We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
YouTube-VOS: Sequence-to-Sequence Video Object Segmentation
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
CCNet: Criss-Cross Attention for Semantic Segmentation
Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.
Video Object Segmentation with Re-identification
Specifically, our Video Object Segmentation with Re-identification (VS-ReID) model includes a mask propagation module and a ReID module.
Physarum Powered Differentiable Linear Programming Layers and Applications
We describe our development and show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets.
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion.
Rethinking the Evaluation of Video Summaries
Video summarization is a technique to create a short skim of the original video while preserving the main stories/content.
Temporal Aggregate Representations for Long-Range Video Understanding
Future prediction, especially in long-range videos, requires reasoning from current and past observations.