VISOR is a dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video.
1 PAPER • NO BENCHMARKS YET
The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p.
638 PAPERS • 13 BENCHMARKS
Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.
175 PAPERS • 10 BENCHMARKS
CoMplex video Object SEgmentation (MOSE) is a dataset to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks.
21 PAPERS • 1 BENCHMARK
AVSBench is a pixel-level audio-visual segmentation benchmark that provides ground truth labels for sounding objects. Accordingly, three settings are studied: 1) semi-supervised audio-visual segmentation with a single sound source 2) fully-supervised audio-visual segmentation with multiple sound sources 3) fully-supervised audio-visual semantic segmentation
10 PAPERS • NO BENCHMARKS YET
The Freiburg-Berkeley Motion Segmentation Dataset (FBMS-59) is an extension of the BMS dataset with 33 additional video sequences. A total of 720 frames is annotated. It has pixel-accurate segmentation annotations of moving objects. FBMS-59 comes with a split into a training set and a test set.
119 PAPERS • 3 BENCHMARKS
SegTrack v2 is a video segmentation dataset with full pixel-level annotations on multiple objects at each frame within each video.
102 PAPERS • 4 BENCHMARKS
DAVIS17 is a dataset for video object segmentation. It contains a total of 150 videos - 60 for training, 30 for validation, 60 for testing
272 PAPERS • 11 BENCHMARKS
DAVIS16 is a dataset for video object segmentation which consists of 50 videos in total (30 videos for training and 20 for testing). Per-frame pixel-wise annotations are offered.
217 PAPERS • 4 BENCHMARKS
LVOS is a dataset for long-term video object segmentation (VOS). It consists of 220 videos with a total duration of 421 minutes.
…We break the dataset into six segments, each with approximately 5K videos.
11 PAPERS • NO BENCHMARKS YET
…Segmentation masks Bounding boxes For the full description of labels and metadata, check out the README.
0 PAPER • NO BENCHMARKS YET