Video Semantic Segmentation

150 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Greatest papers with code

FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

tensorflow/models CVPR 2019

Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use.

Fine-tuning Semantic Segmentation +2

Fully Convolutional Networks for Semantic Segmentation

pytorch/vision CVPR 2015

Convolutional networks are powerful visual models that yield hierarchies of features.

 Ranked #1 on Semantic Segmentation on NYU Depth v2 (Mean Accuracy metric)

Fine-tuning Real-Time Semantic Segmentation +2

CCNet: Criss-Cross Attention for Semantic Segmentation

open-mmlab/mmsegmentation ICCV 2019

Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.

Ranked #6 on Semantic Segmentation on FoodSeg103 (using extra training data)

Human Parsing Instance Segmentation +4

Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos

visionml/pytracking ICCV 2021

This effectively limits the performance and generalization capabilities of existing video segmentation methods.

Semantic Segmentation Video Object Segmentation +2

Learning What to Learn for Video Object Segmentation

visionml/pytracking ECCV 2020

This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach.

Few-Shot Learning One-shot visual object segmentation +3

Deep Feature Flow for Video Recognition

open-mmlab/mmtracking CVPR 2017

Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable.

Video Recognition Video Semantic Segmentation

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective

open-mmlab/mmaction2 ICCV 2021

To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning.

Contrastive Learning Semantic Segmentation +3

State-Aware Tracker for Real-Time Video Object Segmentation

MegviiDetection/video_analyst CVPR 2020

For higher efficiency, SAT takes advantage of the inter-frame consistency and deals with each target object as a tracklet.

Semantic Segmentation Semi-Supervised Video Object Segmentation +1