Video Semantic Segmentation
325 papers with code • 5 benchmarks • 8 datasets
Libraries
Use these libraries to find Video Semantic Segmentation models and implementationsLatest papers with no code
Point-VOS: Pointing Up Video Object Segmentation
We propose a novel Point-VOS task with a spatio-temporally sparse point-wise annotation scheme that substantially reduces the annotation effort.
Is Two-shot All You Need? A Label-efficient Approach for Video Segmentation in Breast Ultrasound
Breast lesion segmentation from breast ultrasound (BUS) videos could assist in early diagnosis and treatment.
Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention
This is enabled by deformable attention mechanism, where the keys and values capturing the memory of a video sequence in the attention module have flexible locations updated across frames.
Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation
Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction.
Understanding Video Transformers via Universal Concept Discovery
Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered.
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.
Appearance-based Refinement for Object-Centric Motion Segmentation
The goal of this paper is to discover, segment, and track independently moving objects in complex visual scenes.
Artificial intelligence optical hardware empowers high-resolution hyperspectral video understanding at 1.2 Tb/s
The technology platform combines artificial intelligence hardware, processing information optically, with state-of-the-art machine vision networks, resulting in a data processing speed of 1. 2 Tb/s with hundreds of frequency bands and megapixel spatial resolution at video rates.
TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking
In this work we propose a novel, clip-based DETR-style encoder-decoder architecture, which focuses on systematically analyzing and addressing aforementioned challenges.
GenDeF: Learning Generative Deformation Field for Video Generation
We offer a new perspective on approaching the task of video generation.