Video Semantic Segmentation
326 papers with code • 5 benchmarks • 8 datasets
Libraries
Use these libraries to find Video Semantic Segmentation models and implementationsLatest papers
RAP-SAM: Towards Real-Time All-Purpose Segment Anything
Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.
1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation
The recent transformer-based models have dominated the Referring Video Object Segmentation (RVOS) task due to the superior performance.
Tracking with Human-Intent Reasoning
The perception component then generates the tracking results based on the embeddings.
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
We evaluate our unified models on various benchmarks.
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS).
AutoVisual Fusion Suite: A Comprehensive Evaluation of Image Segmentation and Voice Conversion Tools on HuggingFace Platform
This study presents a comprehensive evaluation of tools available on the HuggingFace platform for two pivotal applications in artificial intelligence: image segmentation and voice conversion.
Hierarchical Graph Pattern Understanding for Zero-Shot VOS
However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene.
Semi-supervised Active Learning for Video Action Detection
First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21.
Flexible visual prompts for in-context learning in computer vision
Additionally, we propose a technique for support set selection, which involves choosing the most relevant images to include in this set.
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning
Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion.