The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.
13,038 PAPERS โข 74 BENCHMARKS
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., โthere are cars in this imageโ but โthere are no tigers,โ and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., โthere is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixelsโ. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
12,559 PAPERS โข 49 BENCHMARKS
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. There are 600 images per class. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 500 training images and 100 testing images per class.
6,928 PAPERS โข 50 BENCHMARKS
Fashion-MNIST is a dataset comprising of 28ร28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST shares the same image size, data format and the structure of training and testing splits with the original MNIST.
2,571 PAPERS โข 18 BENCHMARKS
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.
1,855 PAPERS โข 6 BENCHMARKS
The Places dataset is proposed for scene recognition and contains more than 2.5 million images covering more than 205 scene categories with more than 5,000 images per category.
969 PAPERS โข 6 BENCHMARKS
The STL-10 is an image dataset derived from ImageNet and popularly used to evaluate algorithms of unsupervised feature learning or self-taught learning. Besides 100,000 unlabeled images, it contains 13,000 labeled images from 10 object classes (such as birds, cats, trucks), among which 5,000 images are partitioned for training while the remaining 8,000 images for testing. All the images are color images with 96ร96 pixels in size.
897 PAPERS โข 17 BENCHMARKS
The Describable Textures Dataset (DTD) contains 5640 texture images in the wild. They are annotated with human-centric attributes inspired by the perceptual properties of textures.
547 PAPERS โข 5 BENCHMARKS
The iNaturalist 2017 dataset (iNat) contains 675,170 training and validation images from 5,089 natural fine-grained categories. Those categories belong to 13 super-categories including Plantae (Plant), Insecta (Insect), Aves (Bird), Mammalia (Mammal), and so on. The iNat dataset is highly imbalanced with dramatically different number of images per category. For example, the largest super-category โPlantae (Plant)โ has 196,613 images from 2,101 categories; whereas the smallest super-category โProtozoaโ only has 381 images from 4 categories.
443 PAPERS โข 10 BENCHMARKS
iSUN is a ground truth of gaze traces on images from the SUN dataset. The collection is partitioned into 6,000 images for training, 926 for validation and 2,000 for test.
77 PAPERS โข NO BENCHMARKS YET
The Places365 dataset is a scene recognition dataset. It is composed of 10 million images comprising 434 scene classes. There are two versions of the dataset: Places365-Standard with 1.8 million train and 36000 validation images from K=365 scene classes, and Places365-Challenge-2016, in which the size of the training set is increased up to 6.2 million extra images, including 69 new scene classes (leading to a total of 8 million train images from 434 scene classes).
51 PAPERS โข 8 BENCHMARKS
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
23 PAPERS โข 4 BENCHMARKS
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while Textures is out-of-distribution.
23 PAPERS โข 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while iNaturalist is out-of-distribution.
21 PAPERS โข 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while Places is out-of-distribution.
19 PAPERS โข 1 BENCHMARK
A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while SUN is out-of-distribution.
17 PAPERS โข 1 BENCHMARK
It is manually annotated, comes with a naturally diverse distribution, and has a large scale. It is built to overcome several shortcomings of existing OOD benchmarks. OpenImage-O is image-by-image filtered from the test set of OpenImage-V3, which has been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias.
14 PAPERS โข NO BENCHMARKS YET
A large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing.
7 PAPERS โข 4 BENCHMARKS
OpenImage-O is built for the ID dataset ImageNet-1k. It is manually annotated, comes with a naturally diverse distribution, and has a large scale. It is built to overcome several shortcomings of existing OOD benchmarks. OpenImage-O is image-by-image filtered from the test set of OpenImage-V3, which has been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias.
4 PAPERS โข 1 BENCHMARK
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.
2 PAPERS โข NO BENCHMARKS YET
Icons-50 is a dataset for studying surface variation robustness.
1 PAPER โข NO BENCHMARKS YET
This dataset was presented as part of the ICLR 2023 paper ๐ ๐ง๐ณ๐ข๐ฎ๐ฆ๐ธ๐ฐ๐ณ๐ฌ ๐ง๐ฐ๐ณ ๐ฃ๐ฆ๐ฏ๐ค๐ฉ๐ฎ๐ข๐ณ๐ฌ๐ช๐ฏ๐จ ๐๐ญ๐ข๐ด๐ด-๐ฐ๐ถ๐ต-๐ฐ๐ง-๐ฅ๐ช๐ด๐ต๐ณ๐ช๐ฃ๐ถ๐ต๐ช๐ฐ๐ฏ ๐ฅ๐ฆ๐ต๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ ๐ข๐ฏ๐ฅ ๐ช๐ต๐ด ๐ข๐ฑ๐ฑ๐ญ๐ช๐ค๐ข๐ต๐ช๐ฐ๐ฏ ๐ต๐ฐ ๐๐ฎ๐ข๐จ๐ฆ๐๐ฆ๐ต.
1 PAPER โข 1 BENCHMARK
A genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem.
Simulated pulse Doppler radar signatures for four classes of helicopter-like targets. The classes differ in the number of rotating blades each kind of target carries, thus each class translates into a specific modulation pattern on the Doppler signature. Doppler signatures are a typical feature used to achieve radar targets discrimination. This dataset was generated using a simple open-source MATLAB simulation code, which can be easily modified to generate custom datasets with more classes and increased intra-class diversity.