53 papers with code • 1 benchmarks • 6 datasets
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
We describe our development and show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames.
The dataset, named DAVIS (Densely Annotated VIdeo Segmentation), consists of fifty high quality, Full HD video sequences, spanning multiple occurrences of common video object segmentation challenges such as occlusions, motion-blur and appearance changes.