13 dataset results for Semi-Supervised Video Object Segmentation

DAVIS (Densely Annotated VIdeo Segmentation)

The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p. There are 50 video sequences with 3455 densely annotated frames in pixel level. 30 videos with 2079 frames are for training and 20 videos with 1376 frames are for validation.

634 PAPERS • 13 BENCHMARKS

DAVIS 2017

DAVIS17 is a dataset for video object segmentation. It contains a total of 150 videos - 60 for training, 30 for validation, 60 for testing

270 PAPERS • 11 BENCHMARKS

DAVIS 2016

DAVIS16 is a dataset for video object segmentation which consists of 50 videos in total (30 videos for training and 20 for testing). Per-frame pixel-wise annotations are offered.

216 PAPERS • 4 BENCHMARKS

YouTube-VOS 2018 (Youtube Video Object Segmentation)

Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. The training and validation videos have pixel-level ground truth annotations for every 5th frame (6 fps). It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.

174 PAPERS • 10 BENCHMARKS

Referring Expressions for DAVIS 2016 & 2017

Our task is to localize and provide a pixel-level mask of an object on all video frames given a language referring expression obtained either by looking at the first frame only or the full video. To validate our approach we employ two popular video object segmentation datasets, DAVIS16 [38] and DAVIS17 [42]. These two datasets introduce various challenges, containing videos with single or multiple salient objects, crowded scenes, similar looking instances, occlusions, camera view changes, fast motion, etc.

75 PAPERS • 5 BENCHMARKS

VOTChallenge

VOTChallenge (Visual Object Tracking)

The Visual Object Tracking (VOT) dataset is a collection of video sequences used for evaluating and benchmarking visual object tracking algorithms. It provides a standardized platform for researchers and practitioners to assess the performance of different tracking methods.

30 PAPERS • 7 BENCHMARKS

MOSE (Complex Video Object Segmentation)

CoMplex video Object SEgmentation (MOSE) is a dataset to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks. The most notable feature of MOSE dataset is complex scenes with crowded and occluded objects.

20 PAPERS • 1 BENCHMARK

BURST

BURST is a benchmark suite built upon TAO that requires tracking and segmenting multiple objects from camera video. The benchmark contains 6 different sub-tasks divided into 2 groups that all share the same data for training/validation/testing.

14 PAPERS • 5 BENCHMARKS

BL30K

BL30K is a synthetic dataset rendered using Blender with ShapeNet's data. We break the dataset into six segments, each with approximately 5K videos. The videos are organized in a similar format as DAVIS and YouTubeVOS, so dataloaders for those datasets can be used directly. Each video is 160 frames long, and each frame has a resolution of 768*512. There are 3-5 objects per video, and each object has a random smooth trajectory -- we tried to optimize the trajectories in a greedy fashion to minimize object intersection (not guaranteed), with occlusions still possible (happen a lot in reality). See MiVOS for details.

11 PAPERS • NO BENCHMARKS YET

VOT2020

VOT2020 is a Visual Object Tracking benchmark for short-term tracking in RGB.

6 PAPERS • 1 BENCHMARK

Long Video Dataset

We randomly selected three videos from the Internet, that are longer than 1.5K frames and have their main objects continuously appearing. Each video has 20 uniformly sampled frames manually annotated for evaluation.

5 PAPERS • 1 BENCHMARK

Long Video Dataset (3X)

2 PAPERS • 1 BENCHMARK

PUMaVOS

PUMaVOS (Partial and Unusual Masks for Video Object Segmentation)

PUMaVOS is a dataset of challenging and practical use cases inspired by the movie production industry.

1 PAPER • NO BENCHMARKS YET

Datasets

13 dataset results for Semi-Supervised Video Object Segmentation