Video Semantic Segmentation
150 papers with code • 2 benchmarks • 5 datasets
Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use.
Ranked #1 on Semi-Supervised Video Object Segmentation on YouTube
Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.
Ranked #6 on Semantic Segmentation on FoodSeg103 (using extra training data)
Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable.
Ranked #8 on Video Semantic Segmentation on Cityscapes val
To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning.
We introduce a self-supervised method for learning visual correspondence from unlabeled video.