98 papers with code • 1 benchmarks • 9 datasets
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
We describe our development and show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets.
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.