The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
7,649 PAPERS • 80 BENCHMARKS
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
1,853 PAPERS • 101 BENCHMARKS
Market-1501 is a large-scale public benchmark dataset for person re-identification. It contains 1501 identities which are captured by six different cameras, and 32,668 pedestrian image bounding-boxes obtained using the Deformable Part Models pedestrian detector. Each person has 3.6 images on average at each viewpoint. The dataset is split into two parts: 750 identities are utilized for training and the remaining 751 identities are used for testing. In the official testing protocol 3,368 query images are selected as probe set to find the correct match across 19,732 reference gallery images.
616 PAPERS • 9 BENCHMARKS
Office-Home is a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art – artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart – collection of clipart images; Product – images of objects without a background and Real-World – images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.
425 PAPERS • 9 BENCHMARKS
The Office dataset contains 31 object categories in three domains: Amazon, DSLR and Webcam. The 31 categories in the dataset consist of objects commonly encountered in office settings, such as keyboards, file cabinets, and laptops. The Amazon domain contains on average 90 images per class and 2817 images in total. As these images were captured from a website of online merchants, they are captured against clean background and at a unified scale. The DSLR domain contains 498 low-noise high resolution images (4288×2848). There are 5 objects per category. Each object was captured from different viewpoints on average 3 times. For Webcam, the 795 images of low resolution (640×480) exhibit significant noise and color as well as white balance artifacts.
373 PAPERS • 10 BENCHMARKS
The SYNTHIA dataset is a synthetic dataset that consists of 9400 multi-viewpoint photo-realistic frames rendered from a virtual city and comes with pixel-level semantic annotations for 13 classes. Each frame has resolution of 1280 × 960.
298 PAPERS • 9 BENCHMARKS
The GTA5 dataset contains 24966 synthetic images with pixel level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. There are 19 semantic classes which are compatible with the ones of Cityscapes dataset.
230 PAPERS • 6 BENCHMARKS
ImageNet-C is an open source data set that consists of algorithmically generated corruptions (blur, noise) applied to the ImageNet test-set.
218 PAPERS • 3 BENCHMARKS
VisDA-2017 is a simulation-to-real dataset for domain adaptation with over 280,000 images across 12 categories in the training, validation and testing domains. The training images are generated from the same object under different circumstances, while the validation images are collected from MSCOCO..
110 PAPERS • 3 BENCHMARKS
Foggy Cityscapes is a synthetic foggy dataset which simulates fog on real scenes. Each foggy image is rendered with a clear image and depth map from Cityscapes. Thus the annotations and data split in Foggy Cityscapes are inherited from Cityscapes.
106 PAPERS • 3 BENCHMARKS
The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.
83 PAPERS • 4 BENCHMARKS
ImageNet-R(endition) contains art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes.
66 PAPERS • 3 BENCHMARKS
SIM10k is a synthetic dataset containing 10,000 images, which is rendered from the video game Grand Theft Auto V (GTA5).
29 PAPERS • 2 BENCHMARKS
The Cross-dataset Testbed is a Decaf7 based cross-dataset image classification dataset, which contains 40 categories of images from 3 domains: 3,847 images in Caltech256, 4,000 images in ImageNet, and 2,626 images for SUN. In total there are 10,473 images of 40 categories from these three domains.
7 PAPERS • NO BENCHMARKS YET
MegaAge is a large dataset that consists of 41,941 faces annotated with age posterior distributions.
4 PAPERS • NO BENCHMARKS YET
Adaptiope is a domain adaptation dataset with 123 classes in the three domains synthetic, product and real life. One of the main goals of Adaptiope is to offer a clean and well curated set of images for domain adaptation. This was necessary as many other common datasets in the area suffer from label noise and low quality images. Additionally, Adaptiope's class set was chosen in a way that minimizes the overlap with the class set of the commonly used ImageNet pretraining, therefore preventing information leakage in a domain adaptation setup.
3 PAPERS • NO BENCHMARKS YET
Modern Office-31 is a refurbished version of the commonly used Office-31 dataset. Modern Office-31 rectifies many of the annotation errors and low quality images in the Amazon domain of the original Office-31 dataset. Additionally, this dataset adds another synthetic domain based on the Adaptiope dataset.
3 PAPERS • NO BENCHMARKS YET
Libri-Adapt aims to support unsupervised domain adaptation research on speech recognition models.
2 PAPERS • 2 BENCHMARKS
This dataset contains 114 individuals including 1824 images captured from two disjoint camera views. For each person, eight images are captured from eight different orientations under one camera view and are normalized to 128x48 pixels. This dataset is also split into two parts randomly. One contains 57 individuals for training, and the other contains 57 individuals for testing.
1 PAPER • NO BENCHMARKS YET
The Sims4Action Dataset: a videogame-based dataset for Synthetic→Real domain adaptation for human activity recognition.
1 PAPER • NO BENCHMARKS YET
UDA-CH contains 16 objects that cover a variety of artworks which can be found in a museum like sculptures, paintings and books. Specifically, the dataset has been collected inside the cultural site “Galleria Regionale di Palazzo Bellomo” located in Siracusa, Italy.
1 PAPER • NO BENCHMARKS YET