The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
5,147 PAPERS • 66 BENCHMARKS
Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.
1,923 PAPERS • 28 BENCHMARKS
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
1,850 PAPERS • 101 BENCHMARKS
ScanNet is an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled voxels rather than points or objects. Up to now, ScanNet v2, the newest version of ScanNet, has collected 1513 annotated scans with an approximate 90% surface coverage. In the semantic segmentation task, this dataset is marked in 20 classes of annotated 3D voxelized objects.
422 PAPERS • 14 BENCHMARKS
SemanticKITTI is a large-scale outdoor-scene dataset for point cloud semantic segmentation. It is derived from the KITTI Vision Odometry Benchmark which it extends with dense point-wise annotations for the complete 360 field-of-view of the employed automotive LiDAR. The dataset consists of 22 sequences. Overall, the dataset provides 23201 point clouds for training and 20351 for testing.
155 PAPERS • 4 BENCHMARKS
Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Based on this diverse dataset, we build a benchmark for heterogeneous multitask learning and study how to solve the tasks together. Our experiments show that special training strategies are needed for existing models to perform such heterogeneous tasks. BDD100K opens the door for future studies in thi
68 PAPERS • 9 BENCHMARKS
Mapillary Vistas Dataset is a diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world.
57 PAPERS • 3 BENCHMARKS
The Cityscapes Panoptic Parts dataset introduces part-aware panoptic segmentation annotations for the Cityscapes dataset. It extends the original panoptic annotations for the Cityscapes dataset with part-level annotations for selected scene-level classes.
2 PAPERS • NO BENCHMARKS YET
PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural parcels from satellite image time series. It is composed of 2433 one square kilometer-patches in the French metropolitan territory for which sequences of satellite observations are assembled into a four-dimensional spatio-temporal tensor. The dataset contains both semantic and instance annotations, assigning to each pixel a semantic label and an instance id. There is an official 5 fold split provided in the dataset's metadata.
2 PAPERS • 2 BENCHMARKS
Panoptic nuScenes is a benchmark dataset that extends the popular nuScenes dataset with point-wise groundtruth annotations for semantic segmentation, panoptic segmentation, and panoptic tracking tasks.
2 PAPERS • NO BENCHMARKS YET
The Pascal Panoptic Parts dataset consists of annotations for the part-aware panoptic segmentation task on the PASCAL VOC 2010 dataset. It is created by merging scene-level labels from PASCAL-Context with part-level labels from PASCAL-Part
2 PAPERS • NO BENCHMARKS YET
Lemon dataset has been prepared to investigate the possibilities to tackle the issue of fruit quality control. It contains 2690 annotated images (1056 x 1056 pixels). Raw lemon images have been captured using the procedure described in the following blogpost and manually annotated using CVAT.
0 PAPER • NO BENCHMARKS YET