The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.
14,191 PAPERS • 98 BENCHMARKS
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. There are 600 images per class. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 500 training images and 100 testing images per class.
7,721 PAPERS • 52 BENCHMARKS
Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST shares the same image size, data format and the structure of training and testing splits with the original MNIST.
2,800 PAPERS • 17 BENCHMARKS
The STL-10 is an image dataset derived from ImageNet and popularly used to evaluate algorithms of unsupervised feature learning or self-taught learning. Besides 100,000 unlabeled images, it contains 13,000 labeled images from 10 object classes (such as birds, cats, trucks), among which 5,000 images are partitioned for training while the remaining 8,000 images for testing. All the images are color images with 96×96 pixels in size.
971 PAPERS • 17 BENCHMARKS
iSUN is a ground truth of gaze traces on images from the SUN dataset. The collection is partitioned into 6,000 images for training, 926 for validation and 2,000 for test.
88 PAPERS • NO BENCHMARKS YET
The Places365 dataset is a scene recognition dataset. It is composed of 10 million images comprising 434 scene classes. There are two versions of the dataset: Places365-Standard with 1.8 million train and 36000 validation images from K=365 scene classes, and Places365-Challenge-2016, in which the size of the training set is increased up to 6.2 million extra images, including 69 new scene classes (leading to a total of 8 million train images from 434 scene classes).
56 PAPERS • 8 BENCHMARKS
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while Textures is out-of-distribution.
24 PAPERS • 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while Places is out-of-distribution.
20 PAPERS • 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while SUN is out-of-distribution.
18 PAPERS • 1 BENCHMARK
The NINCO (No ImageNet Class Objects) dataset is introduced in the ICML 2023 paper In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation. The images in this dataset are free from objects that belong to any of the 1000 classes of ImageNet-1K (ILSVRC2012), which makes NINCO suitable for evaluating out-of-distribution detection on ImageNet-1K .
6 PAPERS • NO BENCHMARKS YET
OpenImage-O is built for the ID dataset ImageNet-1k. It is manually annotated, comes with a naturally diverse distribution, and has a large scale. It is built to overcome several shortcomings of existing OOD benchmarks. OpenImage-O is image-by-image filtered from the test set of OpenImage-V3, which has been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias.
4 PAPERS • 1 BENCHMARK
2 PAPERS • 1 BENCHMARK
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.
2 PAPERS • NO BENCHMARKS YET
Icons-50 is a dataset for studying surface variation robustness.
1 PAPER • NO BENCHMARKS YET