VISOR is a dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video.
1 PAPER • NO BENCHMARKS YET
The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p.
634 PAPERS • 13 BENCHMARKS
Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.
174 PAPERS • 10 BENCHMARKS
CoMplex video Object SEgmentation (MOSE) is a dataset to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks.
20 PAPERS • 1 BENCHMARK
AVSBench is a pixel-level audio-visual segmentation benchmark that provides ground truth labels for sounding objects. Accordingly, three settings are studied: 1) semi-supervised audio-visual segmentation with a single sound source 2) fully-supervised audio-visual segmentation with multiple sound sources 3) fully-supervised audio-visual semantic segmentation
10 PAPERS • NO BENCHMARKS YET
ODMS is a dataset for learning Object Depth via Motion and Segmentation. ODMS training data are configurable and extensible, with each training example consisting of a series of object segmentation masks, camera movement distances, and ground truth object depth.
2 PAPERS • NO BENCHMARKS YET
The Freiburg-Berkeley Motion Segmentation Dataset (FBMS-59) is an extension of the BMS dataset with 33 additional video sequences. A total of 720 frames is annotated. It has pixel-accurate segmentation annotations of moving objects. FBMS-59 comes with a split into a training set and a test set.
118 PAPERS • 3 BENCHMARKS
…Partial and Unusual Masks for Video Object Segmentation (PUMaVOS) dataset has the following properties: - 24 videos, 21187 densely-annotated frames; - Covers complex practical use cases such as object
SegTrack v2 is a video segmentation dataset with full pixel-level annotations on multiple objects at each frame within each video.
102 PAPERS • 4 BENCHMARKS
DAVIS17 is a dataset for video object segmentation. It contains a total of 150 videos - 60 for training, 30 for validation, 60 for testing
270 PAPERS • 11 BENCHMARKS
DAVIS16 is a dataset for video object segmentation which consists of 50 videos in total (30 videos for training and 20 for testing). Per-frame pixel-wise annotations are offered.
216 PAPERS • 4 BENCHMARKS
LVOS is a dataset for long-term video object segmentation (VOS). It consists of 220 videos with a total duration of 421 minutes.
9 PAPERS • NO BENCHMARKS YET
…To validate our approach we employ two popular video object segmentation datasets, DAVIS16 [38] and DAVIS17 [42]. For the multiple object video segmentation task we consider DAVIS17. As our goal is to segment objects in videos using language specifications, we augment all objects annotated with mask labels in DAVIS16 and DAVIS17 with non-ambiguous referring expressions. (We actually quantified that only∼ 15% of the collected descriptions become invalid over time and it does not affect strongly segmentation results as temporal consistency step helps to disambiguate some We believe the collected data will be of interest to segmentation as well as vision and language communities, providing an opportunity to explore language as alternative input for video object segmentation
75 PAPERS • 5 BENCHMARKS
…We break the dataset into six segments, each with approximately 5K videos.
11 PAPERS • NO BENCHMARKS YET
…Segmentation masks Bounding boxes For the full description of labels and metadata, check out the README.
0 PAPER • NO BENCHMARKS YET