Synthetic training dataset of 50,000 depth images and 320,000 object masks using simulated heaps of 3D CAD models.
3 PAPERS • 1 BENCHMARK
The Multi-Object and Segmentation (MOTS) benchmark 2 consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation 2012 and extends the annotations to the Multi-Object and Segmentation (MOTS) task. To this end, we added dense pixel-wise segmentation labels for every object. We evaluate submitted results using the metrics HOTA, CLEAR MOT, and MT/PT/ML. We rank methods by HOTA 1. (adapted for the segmentation case). Evaluation is performed using the code from the TrackEval repository. 1 J. Luiten, A. Os̆ep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, B. Leibe: MOTS: Multi-Object Tracking and Segmentation. CVPR 2019.
26 PAPERS • 1 BENCHMARK
Human fibrosarcoma HT1080WT (ATCC) cells at low cell densities embedded in 3D collagen type I matrices [1]. The time-lapse videos were recorded every 2 minutes for 16.7 hours and covered a field of view of 1002 pixels × 1004 pixels with a pixel size of 0.802 μm/pixel The videos were pre-processed to correct frame-to-frame drift artifacts, resulting in a final size of 983 pixels × 985 pixels pixels.
1 PAPER • NO BENCHMARKS YET
The GOT-10k dataset contains more than 10,000 video segments of real-world moving objects and over 1.5 million manually labelled bounding boxes.
203 PAPERS • 2 BENCHMARKS
…For each sequence we provide multiple sets of images containing RGB, depth, class segmentation, instance segmentation, flow, and scene flow data.
33 PAPERS • 1 BENCHMARK
…In total, 103 scenes of 10 common off-the-shelf objects with rich textures are recorded, with each frame annotated with a per-pixel semantic segmentation and ground-truth object poses provided by a commercial
2 PAPERS • NO BENCHMARKS YET
…The videos are densely annotated with six types of labels: object and point tracks, temporal action and sound segments, multiple-choice video question-answers and grounded video question-answers.
4 PAPERS • NO BENCHMARKS YET
…synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation
120 PAPERS • 1 BENCHMARK
…Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities.
3,219 PAPERS • 141 BENCHMARKS