9 dataset results for segmentation AND Semi-Supervised Video Object Segmentation

DAVIS (Densely Annotated VIdeo Segmentation)

The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p.

634 PAPERS • 13 BENCHMARKS

YouTube-VOS 2018 (Youtube Video Object Segmentation)

Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.

174 PAPERS • 10 BENCHMARKS

MOSE (Complex Video Object Segmentation)

CoMplex video Object SEgmentation (MOSE) is a dataset to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks.

20 PAPERS • 1 BENCHMARK

PUMaVOS

PUMaVOS (Partial and Unusual Masks for Video Object Segmentation)

…Partial and Unusual Masks for Video Object Segmentation (PUMaVOS) dataset has the following properties: - 24 videos, 21187 densely-annotated frames; - Covers complex practical use cases such as object

1 PAPER • NO BENCHMARKS YET

BURST

BURST is a benchmark suite built upon TAO that requires tracking and segmenting multiple objects from camera video. Class-guided Common: Track and segment all objects belonging to a set of 78 common classes (based on the COCO class set) Long-tail: Track and segment all objects belonging to an extended set of 482 object all 482 object classes (class label predictions are not required) Exemplar-guided Mask: Track and segment all objects in the video for which the first-frame object masks are given. This task is identical to Video Object Segmentation (VOS). Box: Track and segment all objects in the video for which the first-frame object bounding-boxes are given. Point: Track and segment all objects in the video for which we are only given the (x,y) point coordinates of the mask centroid in the first-frame in which the objects appear.

14 PAPERS • 5 BENCHMARKS

DAVIS 2017

DAVIS17 is a dataset for video object segmentation. It contains a total of 150 videos - 60 for training, 30 for validation, 60 for testing

270 PAPERS • 11 BENCHMARKS

DAVIS 2016

DAVIS16 is a dataset for video object segmentation which consists of 50 videos in total (30 videos for training and 20 for testing). Per-frame pixel-wise annotations are offered.

216 PAPERS • 4 BENCHMARKS

Referring Expressions for DAVIS 2016 & 2017

…To validate our approach we employ two popular video object segmentation datasets, DAVIS16 [38] and DAVIS17 [42]. For the multiple object video segmentation task we consider DAVIS17. As our goal is to segment objects in videos using language specifications, we augment all objects annotated with mask labels in DAVIS16 and DAVIS17 with non-ambiguous referring expressions. (We actually quantified that only∼ 15% of the collected descriptions become invalid over time and it does not affect strongly segmentation results as temporal consistency step helps to disambiguate some We believe the collected data will be of interest to segmentation as well as vision and language communities, providing an opportunity to explore language as alternative input for video object segmentation

75 PAPERS • 5 BENCHMARKS

BL30K

…We break the dataset into six segments, each with approximately 5K videos.

11 PAPERS • NO BENCHMARKS YET

Datasets

9 dataset results for segmentation AND Semi-Supervised Video Object Segmentation