ImageNet-9 consists of images with different amounts of background and foreground signal, which you can use to measure the extent to which your models rely on image backgrounds. This dataset is helpful in testing the robustness of vision models with respect to their dependence on the backgrounds of images.
5 PAPERS • 1 BENCHMARK
ImageNet-O consists of images from classes that are not found in the ImageNet-1k dataset. It is used to test the robustness of vision models to out-of-distribution samples. It's reported using the AUPR metric.
76 PAPERS • NO BENCHMARKS YET
Context This is image data of Natural Scenes around the world.
4 PAPERS • 2 BENCHMARKS
7 PAPERS • 2 BENCHMARKS
The Kannada-MNIST dataset is a drop-in substitute for the standard MNIST dataset for the Kannada language.
7 PAPERS • NO BENCHMARKS YET
Consists of faces extracted from pre-modern Japanese artwork.
Kuzushiji-49 is an MNIST-like dataset that has 49 classes (28x28 grayscale, 270,912 images) from 48 Hiragana characters and one Hiragana iteration mark.
10 PAPERS • NO BENCHMARKS YET
Kuzushiji-Kanji is an imbalanced dataset of total 3832 Kanji characters (64x64 grayscale, 140,426 images), ranging from 1,766 examples to only a single example per class. Kuzushiji is a Japanese cursive writing style.
4 PAPERS • NO BENCHMARKS YET
Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images). Since MNIST restricts us to 10 classes, the authors chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST. Kuzushiji is a Japanese cursive writing style.
82 PAPERS • 2 BENCHMARKS
Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.
2 PAPERS • NO BENCHMARKS YET
Includes 5,824 fundus images labeled with either positive glaucoma (2,392) or negative glaucoma (3,432).
18 PAPERS • 1 BENCHMARK
An ancestral origin database of 14,000 images of individuals from East Asia, the Indian subcontinent, sub-Saharan Africa, and Western Europe.
1 PAPER • NO BENCHMARKS YET
LKS is a dataset of 684 Liver-Kidney-Stomach immunofluorescence whole slide images (WSIs) used in the investigation of autoimmune liver disease.
3 PAPERS • NO BENCHMARKS YET
An annotated image memorability dataset to date (with 60,000 labeled images from a diverse array of sources).
16 PAPERS • NO BENCHMARKS YET
The MAMe dataset contains images of high-resolution and variable shape of artworks from 3 different museums:
2 PAPERS • 1 BENCHMARK
MINC is a large-scale, open dataset of materials in the wild.
53 PAPERS • NO BENCHMARKS YET
MLRSNet is a a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. It provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.
11 PAPERS • 1 BENCHMARK
The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells.
5 PAPERS • 2 BENCHMARKS
A dataset of all Moroccan money
0 PAPER • NO BENCHMARKS YET
Brief Description The Neuromorphic-MNIST (N-MNIST) dataset is a spiking version of the original frame-based MNIST dataset. It consists of the same 60 000 training and 10 000 testing samples as the original MNIST dataset, and is captured at the same visual scale as the original MNIST dataset (28x28 pixels). The N-MNIST dataset was captured by mounting the ATIS sensor on a motorized pan-tilt unit and having the sensor move while it views MNIST examples on an LCD monitor as shown in this video. A full description of the dataset and how it was created can be found in the paper below. Please cite this paper if you make use of the dataset.
13 PAPERS • 1 BENCHMARK
NAS-Bench-201 is a benchmark (and search space) for neural architecture search. Each architecture consists of a predefined skeleton with a stack of the searched cell. In this way, architecture search is transformed into the problem of searching a good cell.
243 PAPERS • 4 BENCHMARKS
The Oxford-IIIT Pet Dataset is a 37-category pet dataset with roughly 200 images for each class. The images have large variations in scale, pose, and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel-level trimap segmentation.
42 PAPERS • 7 BENCHMARKS
PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:
119 PAPERS • 14 BENCHMARKS
The Prima head pose dataset consists of 2790 images of 15 persons recorded twice. Pitch values lie in the interval [−60∘,60∘], and yaw values lie in the interval [−90∘,90∘] with a 15∘ step. Thus, there are 93 poses available for each person. All the recordings were achieved with the same background. One interesting feature of this dataset is the pose space is uniformly sampled. The dataset is annotated such that a face bounding box (manually annotated) and the corresponding yaw and pitch angle values are provided for each sample.
1 PAPER • 1 BENCHMARK
The PS-Battles dataset is gathered from a large community of image manipulation enthusiasts and provides a basis for media derivation and manipulation detection in the visual domain. The dataset consists of 102'028 images grouped into 11'142 subsets, each containing the original image as well as a varying number of manipulated derivatives.
6 PAPERS • NO BENCHMARKS YET
PlantDoc is a dataset for visual plant disease detection. The dataset contains 2,598 data points in total across 13 plant species and up to 17 classes of diseases, involving approximately 300 human hours of effort in annotating internet scraped images.
12 PAPERS • 1 BENCHMARK
Collects five open polarimetric SAR images, which are images of the San Francisco area. These five images come from different satellites at different times, which has great scientific research value.
The exact pre-processing steps used to construct the MNIST dataset have long been lost. This leaves us with no reliable way to associate its characters with the ID of the writer and little hope to recover the full MNIST testing set that had 60K images but was never released. The official MNIST testing set only contains 10K randomly sampled images and is often considered too small to provide meaningful confidence intervals. The QMNIST dataset was generated from the original data found in the NIST Special Database 19 with the goal to match the MNIST preprocessing as closely as possible. QMNIST is licensed under the BSD-style license.
23 PAPERS • 2 BENCHMARKS
A synthetic dataset uses for a systematic analysis across common factors of variation.
a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset
4 PAPERS • 1 BENCHMARK
The Scene UNderstanding (SUN) database contains 899 categories and 130,519 images. There are 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition.
34 PAPERS • 5 BENCHMARKS
Street View House Numbers (SVHN) is a digit classification benchmark dataset that contains 600,000 32×32 RGB images of printed digits (from 0 to 9) cropped from pictures of house number plates. The cropped images are centered in the digit of interest, but nearby digits and other distractors are kept in the image. SVHN has three sets: training, testing sets and an extra set with 530,000 images that are less difficult and can be used for helping with the training process.
3,087 PAPERS • 12 BENCHMARKS
So2Sat LCZ42 consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months.
A new dataset for streaming classification consisting of temporally correlated images from 51 distinct object categories and additional evaluation classes outside of the training distribution to test novelty recognition.
Tencent ML-Images is a large open-source multi-label image database, including 17,609,752 training and 88,739 validation image URLs, which are annotated with up to 11,166 categories.
5 PAPERS • NO BENCHMARKS YET
Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images and 50 test images.
948 PAPERS • 8 BENCHMARKS
The Urban Environments dataset is a dataset of 20 land use classes across 300 European cities paired with satellite imagery data.
Consists of more than 210k videos for 310 audio classes.
151 PAPERS • 3 BENCHMARKS
The Vistas-NP dataset is an out-of-distribution detection dataset based on the Mapillary Vistas dataset. The original Vistas dataset consists of 18,000 training images and 2,000 validation images with 66 classes. In Vistas-NP the human classes are used as outliers due to their dispersion across scenes and visual diversity from other objects. The dataset is created by excluding all images with class person and the three rider classes to the test subset. Consequently, the dataset has 8,003 train images and 830 validation images. The test set contains 11,167.
Web of Science (WOS) is a document classification dataset that contains 46,985 documents with 134 categories which include 7 parents categories.
48 PAPERS • 4 BENCHMARKS
The WebVision dataset is designed to facilitate the research on learning visual representation from noisy web data. It is a large scale web images dataset that contains more than 2.4 million of images crawled from the Flickr website and Google Images search.
170 PAPERS • 4 BENCHMARKS
YFCC100M is a that dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014.
224 PAPERS • NO BENCHMARKS YET
The YFCC100M Fine-Grained Geolocation dataset is a subset of 100 a set of 36,146 YFCC100M images that had Flickr tags that could be identified as corresponding to one of the labels in the iNaturalist 2017 dataset. The 36,146 images that were selected so have the following characteristics: the image must have geolocation available, the image must have at most one iNaturalist label, at most ten examples were retained for each label.
Functional Map of the World (fMoW) is a dataset that aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features.
108 PAPERS • NO BENCHMARKS YET
The iCartoonFace dataset is a large-scale dataset that can be used for two different tasks: cartoon face detection and cartoon face recognition.
7 PAPERS • 1 BENCHMARK
The iNaturalist Fine-Grained Geolocation dataset is an extension of the iNaturalist dataset with complementary geolocation information.