The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.
13,038 PAPERS • 74 BENCHMARKS
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
12,559 PAPERS • 49 BENCHMARKS
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. There are 600 images per class. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 500 training images and 100 testing images per class.
6,928 PAPERS • 50 BENCHMARKS
Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST shares the same image size, data format and the structure of training and testing splits with the original MNIST.
2,571 PAPERS • 18 BENCHMARKS
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.
1,855 PAPERS • 6 BENCHMARKS
The Places dataset is proposed for scene recognition and contains more than 2.5 million images covering more than 205 scene categories with more than 5,000 images per category.
969 PAPERS • 6 BENCHMARKS
The STL-10 is an image dataset derived from ImageNet and popularly used to evaluate algorithms of unsupervised feature learning or self-taught learning. Besides 100,000 unlabeled images, it contains 13,000 labeled images from 10 object classes (such as birds, cats, trucks), among which 5,000 images are partitioned for training while the remaining 8,000 images for testing. All the images are color images with 96×96 pixels in size.
897 PAPERS • 17 BENCHMARKS
The Describable Textures Dataset (DTD) contains 5640 texture images in the wild. They are annotated with human-centric attributes inspired by the perceptual properties of textures.
547 PAPERS • 5 BENCHMARKS
The iNaturalist 2017 dataset (iNat) contains 675,170 training and validation images from 5,089 natural fine-grained categories. Those categories belong to 13 super-categories including Plantae (Plant), Insecta (Insect), Aves (Bird), Mammalia (Mammal), and so on. The iNat dataset is highly imbalanced with dramatically different number of images per category. For example, the largest super-category “Plantae (Plant)” has 196,613 images from 2,101 categories; whereas the smallest super-category “Protozoa” only has 381 images from 4 categories.
443 PAPERS • 10 BENCHMARKS
iSUN is a ground truth of gaze traces on images from the SUN dataset. The collection is partitioned into 6,000 images for training, 926 for validation and 2,000 for test.
77 PAPERS • NO BENCHMARKS YET
The Places365 dataset is a scene recognition dataset. It is composed of 10 million images comprising 434 scene classes. There are two versions of the dataset: Places365-Standard with 1.8 million train and 36000 validation images from K=365 scene classes, and Places365-Challenge-2016, in which the size of the training set is increased up to 6.2 million extra images, including 69 new scene classes (leading to a total of 8 million train images from 434 scene classes).
51 PAPERS • 8 BENCHMARKS
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
23 PAPERS • 4 BENCHMARKS
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while Textures is out-of-distribution.
23 PAPERS • 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while iNaturalist is out-of-distribution.
21 PAPERS • 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while Places is out-of-distribution.
19 PAPERS • 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while SUN is out-of-distribution.
17 PAPERS • 1 BENCHMARK
It is manually annotated, comes with a naturally diverse distribution, and has a large scale. It is built to overcome several shortcomings of existing OOD benchmarks. OpenImage-O is image-by-image filtered from the test set of OpenImage-V3, which has been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias.
14 PAPERS • NO BENCHMARKS YET
A large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing.
7 PAPERS • 4 BENCHMARKS
OpenImage-O is built for the ID dataset ImageNet-1k. It is manually annotated, comes with a naturally diverse distribution, and has a large scale. It is built to overcome several shortcomings of existing OOD benchmarks. OpenImage-O is image-by-image filtered from the test set of OpenImage-V3, which has been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias.
4 PAPERS • 1 BENCHMARK
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.
2 PAPERS • NO BENCHMARKS YET
Icons-50 is a dataset for studying surface variation robustness.
1 PAPER • NO BENCHMARKS YET
This dataset was presented as part of the ICLR 2023 paper 𝘈 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘧𝘰𝘳 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬𝘪𝘯𝘨 𝘊𝘭𝘢𝘴𝘴-𝘰𝘶𝘵-𝘰𝘧-𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 𝘥𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯 𝘢𝘯𝘥 𝘪𝘵𝘴 𝘢𝘱𝘱𝘭𝘪𝘤𝘢𝘵𝘪𝘰𝘯 𝘵𝘰 𝘐𝘮𝘢𝘨𝘦𝘕𝘦𝘵.
1 PAPER • 1 BENCHMARK
A genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem.
1 PAPER • NO BENCHMARKS YET
Simulated pulse Doppler radar signatures for four classes of helicopter-like targets. The classes differ in the number of rotating blades each kind of target carries, thus each class translates into a specific modulation pattern on the Doppler signature. Doppler signatures are a typical feature used to achieve radar targets discrimination. This dataset was generated using a simple open-source MATLAB simulation code, which can be easily modified to generate custom datasets with more classes and increased intra-class diversity.
1 PAPER • NO BENCHMARKS YET