The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
4,158 PAPERS • 57 BENCHMARKS
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
1,601 PAPERS • 93 BENCHMARKS
The PASCAL Visual Object Classes (VOC) 2012 dataset contains 20 object categories including vehicles, household, animals, and other: aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, TV/monitor, bird, cat, cow, dog, horse, sheep, and person. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. This dataset has been widely used as a benchmark for object detection, semantic segmentation, and classification tasks. The PASCAL VOC dataset is split into three subsets: 1,464 images for training, 1,449 images for validation and a private testing set.
180 PAPERS • 39 BENCHMARKS
PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:
87 PAPERS • 10 BENCHMARKS
The Mall is a dataset for crowd counting and profiling research. Its images are collected from publicly accessible webcam. It mainly includes 2,000 video frames, and the head position of every pedestrian in all frames is annotated. A total of more than 60,000 pedestrians are annotated in this dataset.
46 PAPERS • 1 BENCHMARK
Honda Research Institute Driving Dataset (HDD) is a dataset to enable research on learning driver behavior in real-life environments. The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors.
22 PAPERS • NO BENCHMARKS YET
The SIXray dataset is constructed by the Pattern Recognition and Intelligent System Development Laboratory, University of Chinese Academy of Sciences. It contains 1,059,231 X-ray images which are collected from some several subway stations. There are six common categories of prohibited items, namely, gun, knife, wrench, pliers, scissors and hammer. It has three subsets called SIXray10, SIXray100 and SIXray1000, There are image-level annotations provided by human security inspectors for the whole dataset. In addition the images in the test set are annotated with a bounding-box for each prohibited item to evaluate the performance of object localization.
11 PAPERS • NO BENCHMARKS YET
Contains 51,583 descriptions of 11,046 objects from 800 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.
3 PAPERS • NO BENCHMARKS YET