KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
3,526 PAPERS • 140 BENCHMARKS
LVIS is a dataset for long tail instance segmentation. It has annotations for over 1000 object categories in 164k images.
522 PAPERS • 14 BENCHMARKS
Objects365 is a large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. More than 10 million, high-quality bounding boxes are manually labeled through a three-step, carefully designed annotation pipeline. It is the largest object detection dataset (with full annotation) so far and establishes a more challenging benchmark for the community.
150 PAPERS • 3 BENCHMARKS
PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:
124 PAPERS • 14 BENCHMARKS
In Clipart1k, the target domain classes to be detected are the same as those in the source domain. All the images for a clipart domain were collected from one dataset (i.e., CMPlaces) and two image search engines (i.e., Openclipart2 and Pixabay3). Search queries used are 205 scene classes (e.g., pasture) used in CMPlaces to collect various objects and scenes with complex backgrounds.
45 PAPERS • NO BENCHMARKS YET
Watercolor2k is a dataset used for cross-domain object detection which contains 2k watercolor images with image and instance-level annotations.
37 PAPERS • 4 BENCHMARKS
Comic2k is a dataset used for cross-domain object detection which contains 2k comic images with image and instance-level annotations. Image Source: https://naoto0804.github.io/cross_domain_detection/
28 PAPERS • 5 BENCHMARKS
UVO is a new benchmark for open-world class-agnostic object segmentation in videos. Besides shifting the problem focus to the open-world setup, UVO is significantly larger, providing approximately 8 times more videos compared with DAVIS, and 7 times more mask (instance) annotations per video compared with YouTube-VOS and YouTube-VIS. UVO is also more challenging as it includes many videos with crowded scenes and complex background motions. Some highlights of the dataset include:
26 PAPERS • 3 BENCHMARKS
OpenImages V6 is a large-scale dataset , consists of 9 million training images, 41,620 validation samples, and 125,456 test samples. It is a partially annotated dataset, with 9,600 trainable classes
21 PAPERS • 3 BENCHMARKS
The LeukemiaAttri dataset is a large-scale, multi-domain collection of microscopy images derived from leukemia patient samples, enriched with detailed morphological information. This dataset comprises a total of 28.9K images (2.4K × 2 × 3 × 2), which were captured using both low-cost and high-cost microscopes at three different resolutions: 10x, 40x, and 100x, utilizing various cameras. In addition to providing location annotations for each white blood cell (WBC), the dataset includes comprehensive morphological attributes for every WBC, enhancing its utility for research and analysis in the field.
1 PAPER • 2 BENCHMARKS