DAVIS17 is a dataset for video object segmentation. It contains a total of 150 videos - 60 for training, 30 for validation, 60 for testing
249 PAPERS • 11 BENCHMARKS
DAVIS16 is a dataset for video object segmentation which consists of 50 videos in total (30 videos for training and 20 for testing). Per-frame pixel-wise annotations are offered.
209 PAPERS • 4 BENCHMARKS
The Freiburg-Berkeley Motion Segmentation Dataset (FBMS-59) is an extension of the BMS dataset with 33 additional video sequences. A total of 720 frames is annotated. It has pixel-accurate segmentation annotations of moving objects. FBMS-59 comes with a split into a training set and a test set.
115 PAPERS • 3 BENCHMARKS
SegTrack v2 is a video segmentation dataset with full pixel-level annotations on multiple objects at each frame within each video.
102 PAPERS • 4 BENCHMARKS
Our task is to localize and provide a pixel-level mask of an object on all video frames given a language referring expression obtained either by looking at the first frame only or the full video. To validate our approach we employ two popular video object segmentation datasets, DAVIS16  and DAVIS17 . These two datasets introduce various challenges, containing videos with single or multiple salient objects, crowded scenes, similar looking instances, occlusions, camera view changes, fast motion, etc.
73 PAPERS • 5 BENCHMARKS
The Freiburg-Berkeley Motion Segmentation Dataset (FBMS-59) is a dataset for motion segmentation, which extends the BMS-26 dataset with 33 additional video sequences. A total of 720 frames is annotated. FBMS-59 comes with a split into a training set and a test set. Typical challenges appear in both sets.
17 PAPERS • 2 BENCHMARKS
CoMplex video Object SEgmentation (MOSE) is a dataset to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks. The most notable feature of MOSE dataset is complex scenes with crowded and occluded objects.
17 PAPERS • 1 BENCHMARK
BL30K is a synthetic dataset rendered using Blender with ShapeNet's data. We break the dataset into six segments, each with approximately 5K videos. The videos are organized in a similar format as DAVIS and YouTubeVOS, so dataloaders for those datasets can be used directly. Each video is 160 frames long, and each frame has a resolution of 768*512. There are 3-5 objects per video, and each object has a random smooth trajectory -- we tried to optimize the trajectories in a greedy fashion to minimize object intersection (not guaranteed), with occlusions still possible (happen a lot in reality). See MiVOS for details.
11 PAPERS • NO BENCHMARKS YET