The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
5,569 PAPERS • 56 BENCHMARKS
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. There are 600 images per class. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 500 training images and 100 testing images per class.
2,497 PAPERS • 28 BENCHMARKS
The Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset is the most widely-used dataset for fine-grained visual categorization task. It contains 11,788 images of 200 subcategories belonging to birds, 5,994 for training and 5,794 for testing. Each image has detailed annotations: 1 subcategory label, 15 part locations, 312 binary attributes and 1 bounding box. The textual information comes from Reed et al.. They expand the CUB-200-2011 dataset by collecting fine-grained natural language descriptions. Ten single-sentence descriptions are collected for each image. The natural language descriptions are collected through the Amazon Mechanical Turk (AMT) platform, and are required at least 10 words, without any information of subcategories and actions.
853 PAPERS • 26 BENCHMARKS
The miniImageNet dataset contains 100 classes randomly chosen from ImageNet ILSVRC-2012 challenge with 600 images of size 84×84 pixels per class. It is split into 64 base classes, 16 validation classes and 20 novel classes
543 PAPERS • 19 BENCHMARKS
Omniglot is a large dataset of hand-written characters with 1623 characters and 20 examples for each character. These characters are collected based upon 50 alphabets from different countries. It contains both images and strokes data. Stroke data are coordinates with time in miliseconds.
330 PAPERS • 14 BENCHMARKS
Oxford 102 Flower is an image classification dataset consisting of 102 flower categories. The flowers chosen to be flower commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images.
317 PAPERS • 10 BENCHMARKS
Caltech-256 is an object recognition dataset containing 30,607 real-world images, of different sizes, spanning 257 classes (256 object classes and an additional clutter class). Each class is represented by at least 80 images. The dataset is a superset of the Caltech-101 dataset.
251 PAPERS • 3 BENCHMARKS
Animals with Attributes (AwA) was a dataset for benchmarking transfer-learning algorithms, in particular attribute base classification. It consisted of 30475 images of 50 animals classes with six pre-extracted feature representations for each image. The animals classes are aligned with Osherson's classical class/attribute matrix, thereby providing 85 numeric attribute values for each class. Using the shared attributes, it is possible to transfer information between different classes. The Animals with Attributes dataset was suspended. Its images are not available anymore because of copyright restrictions. A drop-in replacement, Animals with Attributes 2, is available instead.
185 PAPERS • 6 BENCHMARKS
The Stanford Cars dataset consists of 196 classes of cars with a total of 16,185 images, taken from the rear. The data is divided into almost a 50-50 train/test split with 8,144 training images and 8,041 testing images. Categories are typically at the level of Make, Model, Year. The images are 360×240.
180 PAPERS • 8 BENCHMARKS
The tieredImageNet dataset is a larger subset of ILSVRC-12 with 608 classes (779,165 images) grouped into 34 higher-level nodes in the ImageNet human-curated hierarchy. This set of nodes is partitioned into 20, 6, and 8 disjoint sets of training, validation, and testing nodes, and the corresponding classes form the respective meta-sets. As argued in Ren et al. (2018), this split near the root of the ImageNet hierarchy results in a more challenging, yet realistic regime with test classes that are less similar to training classes.
114 PAPERS • 5 BENCHMARKS
aPY is a coarse-grained dataset composed of 15339 images from 3 broad categories (animals, objects and vehicles), further divided into a total of 32 subcategories (aeroplane, …, zebra).
102 PAPERS • 1 BENCHMARK
The iNaturalist 2017 dataset (iNat) contains 675,170 training and validation images from 5,089 natural fine-grained categories. Those categories belong to 13 super-categories including Plantae (Plant), Insecta (Insect), Aves (Bird), Mammalia (Mammal), and so on. The iNat dataset is highly imbalanced with dramatically different number of images per category. For example, the largest super-category “Plantae (Plant)” has 196,613 images from 2,101 categories; whereas the smallest super-category “Protozoa” only has 381 images from 4 categories.
97 PAPERS • 4 BENCHMARKS
CIFAR100 few-shots (CIFAR-FS) is randomly sampled from CIFAR-100 (Krizhevsky & Hinton, 2009) by using the same criteria with which miniImageNet has been generated. The average inter-class similarity is sufficiently high to represent a challenge for the current state of the art. Moreover, the limited original resolution of 32×32 makes the task harder and at the same time allows fast prototyping.
67 PAPERS • 2 BENCHMARKS
The FC100 dataset (Fewshot-CIFAR100) is a newly split dataset based on CIFAR-100 for few-shot learning. It contains 20 high-level categories which are divided into 12, 4, 4 categories for training, validation and test. There are 60, 20, 20 low-level classes in the corresponding split containing 600 images of size 32 × 32 per class. Smaller image size makes it more challenging for few-shot learning.
58 PAPERS • 5 BENCHMARKS
The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:
40 PAPERS • 2 BENCHMARKS
The Stanford Dogs dataset contains 20,580 images of 120 classes of dogs from around the world, which are divided into 12,000 images for training and 8,580 images for testing.
3 PAPERS • 5 BENCHMARKS
ORBIT is a real-world few-shot dataset and benchmark grounded in a real-world application of teachable object recognizers for people who are blind/low vision. The dataset contains 3,822 videos of 486 objects recorded by people who are blind/low-vision on their mobile phones, and the benchmark reflects a realistic, highly challenging recognition problem, providing a rich playground to drive research in robustness to few-shot, high-variation conditions.
1 PAPER • NO BENCHMARKS YET