The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.
5,634 PAPERS • 44 BENCHMARKS
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
5,569 PAPERS • 56 BENCHMARKS
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
3,881 PAPERS • 36 BENCHMARKS
BSD is a dataset used frequently for image denoising and super-resolution. Of the subdatasets, BSD100 is aclassical image dataset having 100 test images proposed by Martin et al.. The dataset is composed of a large variety of images ranging from natural images to object-specific such as plants, people, food etc. BSD100 is the testing set of the Berkeley segmentation dataset BSD300.
348 PAPERS • 42 BENCHMARKS
Omniglot is a large dataset of hand-written characters with 1623 characters and 20 examples for each character. These characters are collected based upon 50 alphabets from different countries. It contains both images and strokes data. Stroke data are coordinates with time in miliseconds.
330 PAPERS • 14 BENCHMARKS
The Caltech101 dataset contains images from 101 object categories (e.g., “helicopter”, “elephant” and “chair” etc.) and a background category that contains the images not from the 101 object categories. For each object category, there are about 40 to 800 images, while most classes have about 50 images. The resolution of the image is roughly about 300×200 pixels.
242 PAPERS • 6 BENCHMARKS
The Caltech 101 Silhouettes dataset consists of 4,100 training samples, 2,264 validation samples and 2,307 test samples. The datast is based on CalTech 101 image annotations. Each image in the CalTech 101 data set includes a high-quality polygon outline of the primary object in the scene. To create the CalTech 101 Silhouettes data set, the authors center and scale each outline and render it on a DxD pixel image-plane. The outline is rendered as a filled, black polygon on a white background. Many object classes exhibit silhouettes that have distinctive class-specific features. A relatively small number of classes like soccer ball, pizza, stop sign, and yin-yang are indistinguishable based on shape, but have been left-in in the data.
26 PAPERS • NO BENCHMARKS YET
UCF-CC-50 is a dataset for crowd counting and consists of images of extremely dense crowds. It has 50 images with 63,974 head center annotations in total. The head counts range between 94 and 4,543 per image. The small dataset size and large variance make this a very challenging counting dataset.
18 PAPERS • NO BENCHMARKS YET
UCI Machine Learning Repository is a collection of over 550 datasets.
14 PAPERS • 7 BENCHMARKS
The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars.
13 PAPERS • NO BENCHMARKS YET
(JHU-CROWD) a crowd counting dataset that contains 4,250 images with 1.11 million annotations. This dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations in addition to many distractor images, making it a very challenging dataset. Additionally, the dataset consists of rich annotations at both image-level and head-level.
9 PAPERS • NO BENCHMARKS YET
JHU-CROWD++ is A large-scale unconstrained crowd counting dataset with 4,372 images and 1.51 million annotations. This dataset is collected under a variety of diverse scenarios and environmental conditions. In addition, the dataset provides comparatively richer set of annotations like dots, approximate bounding boxes, blur levels, etc.
8 PAPERS • NO BENCHMARKS YET
APRICOT is a collection of over 1,000 annotated photographs of printed adversarial patches in public locations. The patches target several object categories for three COCO-trained detection models, and the photos represent natural variation in position, distance, lighting conditions, and viewing angle.
1 PAPER • NO BENCHMARKS YET