The Multi-Object and Segmentation (MOTS) benchmark [2] consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation 2012 and extends the annotations to the Multi-Object and Segmentation (MOTS) task. To this end, we added dense pixel-wise segmentation labels for every object. We evaluate submitted results using the metrics HOTA, CLEAR MOT, and MT/PT/ML. We rank methods by HOTA [1]. (adapted for the segmentation case). Evaluation is performed using the code from the TrackEval repository. [1] J. Luiten, A. Os̆ep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, B. Leibe: MOTS: Multi-Object Tracking and Segmentation. CVPR 2019.
26 PAPERS • 1 BENCHMARK
Video object segmentation has been studied extensively in the past decade due to its importance in understanding video spatial-temporal structures as well as its value in industrial applications. Previously, we presented the first large-scale video object segmentation dataset named YouTubeVOS and hosted the Large-scale Video Object Segmentation Challenge in conjuction with ECCV 2018, ICCV 2019 This year, we are thrilled to invite you to the 4th Large-scale Video Object Segmentation Challenge in conjunction with CVPR 2022.
5 PAPERS • 1 BENCHMARK
…CARLA-based synthehic multi-vehicle multi-camera tracking dataset and includes ground truth for 2D detection and tracking, 3D detection and tracking, depth estimation, and semantic, instance and panoptic segmentation
2 PAPERS • 1 BENCHMARK
…For each sequence we provide multiple sets of images containing RGB, depth, class segmentation, instance segmentation, flow, and scene flow data.
31 PAPERS • 1 BENCHMARK
…Note that this implies TAO-Amodal also includes modal segmentation masks (as visualized in the color overlays above).
1 PAPER • NO BENCHMARKS YET
…synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation
119 PAPERS • 1 BENCHMARK
…We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking.
…Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities.
3,169 PAPERS • 139 BENCHMARKS