The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.
5,635 PAPERS • 44 BENCHMARKS
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
5,569 PAPERS • 56 BENCHMARKS
The image dataset TinyImages contains 80 million images of size 32×32 collected from the Internet, crawling the words in WordNet.
66 PAPERS • NO BENCHMARKS YET
Reddit12k contains 11929 graphs each corresponding to an online discussion thread where nodes represent users, and an edge represents the fact that one of the two users responded to the comment of the other user. There is 1 of 11 graph labels associated with each of these 11929 discussion graphs, representing the category of the community.
15 PAPERS • NO BENCHMARKS YET
Visual Wake Words represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models.
6 PAPERS • NO BENCHMARKS YET
The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.
5 PAPERS • NO BENCHMARKS YET
The Kannada-MNIST dataset is a drop-in substitute for the standard MNIST dataset for the Kannada language.
4 PAPERS • NO BENCHMARKS YET
The Bach Doodle Dataset is composed of 21.6 million harmonizations submitted from the Bach Doodle. The dataset contains both metadata about the composition (such as the country of origin and feedback), as well as a MIDI of the user-entered melody and a MIDI of the generated harmonization. The dataset contains about 6 years of user entered music.
3 PAPERS • NO BENCHMARKS YET
The CAL10K dataset (introduced as Swat10k) contains 10,870 songs that are weakly-labelled using a tag vocabulary of 475 acoustic tags and 153 genre tags. The tags have all been harvested from Pandora’s website and result from song annotations performed by expert musicologists involved with the Music Genome Project.
1 PAPER • NO BENCHMARKS YET
FAS100K is a large-scale visual localization dataset. This dataset is comprised of two traverses of 238 and 130 kms respectively where the latter is a partial repeat of the former. The data was collected using stereo cameras in Australia under sunny day conditions. It covers a variety of road and environment types including urban and rural areas. The raw image data from one of the cameras streaming at 5 Hz constitutes 63,650 and 34,497 image frames for the two traverses respectively.
1 PAPER • NO BENCHMARKS YET