🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

238 dataset results for Image Classification

ImageNet-9 consists of images with different amounts of background and foreground signal, which you can use to measure the extent to which your models rely on image backgrounds. This dataset is helpful in testing the robustness of vision models with respect to their dependence on the backgrounds of images.

5 PAPERS • 1 BENCHMARK

ImageNet-O

ImageNet-O consists of images from classes that are not found in the ImageNet-1k dataset. It is used to test the robustness of vision models to out-of-distribution samples. It's reported using the AUPR metric.

76 PAPERS • NO BENCHMARKS YET

Intel Image Classification

Context This is image data of Natural Scenes around the world.

4 PAPERS • 2 BENCHMARKS

KMNIST

7 PAPERS • 2 BENCHMARKS

Kannada-MNIST

The Kannada-MNIST dataset is a drop-in substitute for the standard MNIST dataset for the Kannada language.

7 PAPERS • NO BENCHMARKS YET

KaoKore

Consists of faces extracted from pre-modern Japanese artwork.

7 PAPERS • NO BENCHMARKS YET

Kuzushiji-49

Kuzushiji-49 is an MNIST-like dataset that has 49 classes (28x28 grayscale, 270,912 images) from 48 Hiragana characters and one Hiragana iteration mark.

10 PAPERS • NO BENCHMARKS YET

Kuzushiji-Kanji

Kuzushiji-Kanji is an imbalanced dataset of total 3832 Kanji characters (64x64 grayscale, 140,426 images), ranging from 1,766 examples to only a single example per class. Kuzushiji is a Japanese cursive writing style.

4 PAPERS • NO BENCHMARKS YET

Kuzushiji-MNIST

Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images). Since MNIST restricts us to 10 classes, the authors chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST. Kuzushiji is a Japanese cursive writing style.

82 PAPERS • 2 BENCHMARKS

Kvasir-Capsule

Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.

2 PAPERS • NO BENCHMARKS YET

LAG

LAG (Large-scale Attention based Glaucoma)

Includes 5,824 fundus images labeled with either positive glaucoma (2,392) or negative glaucoma (3,432).

18 PAPERS • 1 BENCHMARK

LAOFIW Dataset

LAOFIW Dataset (Labeled Ancestral Origin Faces in the Wild)

An ancestral origin database of 14,000 images of individuals from East Asia, the Indian subcontinent, sub-Saharan Africa, and Western Europe.

1 PAPER • NO BENCHMARKS YET

LKS (Liver Kidney Stomach)

LKS is a dataset of 684 Liver-Kidney-Stomach immunofluorescence whole slide images (WSIs) used in the investigation of autoimmune liver disease.

3 PAPERS • NO BENCHMARKS YET

LaMem

An annotated image memorability dataset to date (with 60,000 labeled images from a diverse array of sources).

16 PAPERS • NO BENCHMARKS YET

MAMe (Museum Art Medium dataset)

The MAMe dataset contains images of high-resolution and variable shape of artworks from 3 different museums:

2 PAPERS • 1 BENCHMARK

MINC (Materials in Context Database)

MINC is a large-scale, open dataset of materials in the wild.

53 PAPERS • NO BENCHMARKS YET

MLRSNet

MLRSNet is a a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. It provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.

11 PAPERS • 1 BENCHMARK

Malaria Dataset

The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells.

5 PAPERS • 2 BENCHMARKS

Moroccan Monay dataset

A dataset of all Moroccan money

0 PAPER • NO BENCHMARKS YET

N-MNIST

N-MNIST (Neuromorphic-MNIST)

Brief Description The Neuromorphic-MNIST (N-MNIST) dataset is a spiking version of the original frame-based MNIST dataset. It consists of the same 60 000 training and 10 000 testing samples as the original MNIST dataset, and is captured at the same visual scale as the original MNIST dataset (28x28 pixels). The N-MNIST dataset was captured by mounting the ATIS sensor on a motorized pan-tilt unit and having the sensor move while it views MNIST examples on an LCD monitor as shown in this video. A full description of the dataset and how it was created can be found in the paper below. Please cite this paper if you make use of the dataset.

13 PAPERS • 1 BENCHMARK

NAS-Bench-201

NAS-Bench-201 is a benchmark (and search space) for neural architecture search. Each architecture consists of a predefined skeleton with a stack of the searched cell. In this way, architecture search is transformed into the problem of searching a good cell.

243 PAPERS • 4 BENCHMARKS

Oxford-IIIT Pets

The Oxford-IIIT Pet Dataset is a 37-category pet dataset with roughly 200 images for each class. The images have large variations in scale, pose, and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel-level trimap segmentation.

42 PAPERS • 7 BENCHMARKS

PASCAL VOC 2007

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:

119 PAPERS • 14 BENCHMARKS

PRImA

The Prima head pose dataset consists of 2790 images of 15 persons recorded twice. Pitch values lie in the interval [−60∘,60∘], and yaw values lie in the interval [−90∘,90∘] with a 15∘ step. Thus, there are 93 poses available for each person. All the recordings were achieved with the same background. One interesting feature of this dataset is the pose space is uniformly sampled. The dataset is annotated such that a face bounding box (manually annotated) and the corresponding yaw and pitch angle values are provided for each sample.

1 PAPER • 1 BENCHMARK

PS-Battles

The PS-Battles dataset is gathered from a large community of image manipulation enthusiasts and provides a basis for media derivation and manipulation detection in the visual domain. The dataset consists of 102'028 images grouped into 11'142 subsets, each containing the original image as well as a varying number of manipulated derivatives.

6 PAPERS • NO BENCHMARKS YET

PlantDoc

PlantDoc is a dataset for visual plant disease detection. The dataset contains 2,598 data points in total across 13 plant species and up to 17 classes of diseases, involving approximately 300 human hours of effort in annotating internet scraped images.

12 PAPERS • 1 BENCHMARK

PolSF

Collects five open polarimetric SAR images, which are images of the San Francisco area. These five images come from different satellites at different times, which has great scientific research value.

2 PAPERS • NO BENCHMARKS YET

QMNIST

The exact pre-processing steps used to construct the MNIST dataset have long been lost. This leaves us with no reliable way to associate its characters with the ID of the writer and little hope to recover the full MNIST testing set that had 60K images but was never released. The official MNIST testing set only contains 10K randomly sampled images and is often considered too small to provide meaningful confidence intervals. The QMNIST dataset was generated from the original data found in the NIST Special Database 19 with the goal to match the MNIST preprocessing as closely as possible. QMNIST is licensed under the BSD-style license.

23 PAPERS • 2 BENCHMARKS

SI-SCORE

SI-SCORE (Synthetic Interventions on Scenes for Robustness Evaluation)

A synthetic dataset uses for a systematic analysis across common factors of variation.

7 PAPERS • NO BENCHMARKS YET

SIPaKMeD

SIPaKMeD (SIPaKMeD Pap Smear dataset)

a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset

4 PAPERS • 1 BENCHMARK

SUN397

The Scene UNderstanding (SUN) database contains 899 categories and 130,519 images. There are 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition.

34 PAPERS • 5 BENCHMARKS

SVHN (Street View House Numbers)

Street View House Numbers (SVHN) is a digit classification benchmark dataset that contains 600,000 32×32 RGB images of printed digits (from 0 to 9) cropped from pictures of house number plates. The cropped images are centered in the digit of interest, but nearby digits and other distractors are kept in the image. SVHN has three sets: training, testing sets and an extra set with 530,000 images that are less difficult and can be used for helping with the training process.

3,087 PAPERS • 12 BENCHMARKS

So2Sat LCZ42

So2Sat LCZ42 consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months.

11 PAPERS • 1 BENCHMARK

Stream-51

A new dataset for streaming classification consisting of temporally correlated images from 51 distinct object categories and additional evaluation classes outside of the training distribution to test novelty recognition.

3 PAPERS • NO BENCHMARKS YET

Tencent ML-Images

Tencent ML-Images is a large open-source multi-label image database, including 17,609,752 training and 88,739 validation image URLs, which are annotated with up to 11,166 categories.

5 PAPERS • NO BENCHMARKS YET

Tiny ImageNet

Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images and 50 test images.

948 PAPERS • 8 BENCHMARKS

Urban Environments

The Urban Environments dataset is a dataset of 20 land use classes across 300 European cities paired with satellite imagery data.

5 PAPERS • NO BENCHMARKS YET

VGG-Sound

Consists of more than 210k videos for 310 audio classes.

151 PAPERS • 3 BENCHMARKS

Vistas-NP

The Vistas-NP dataset is an out-of-distribution detection dataset based on the Mapillary Vistas dataset. The original Vistas dataset consists of 18,000 training images and 2,000 validation images with 66 classes. In Vistas-NP the human classes are used as outliers due to their dispersion across scenes and visual diversity from other objects. The dataset is created by excluding all images with class person and the three rider classes to the test subset. Consequently, the dataset has 8,003 train images and 830 validation images. The test set contains 11,167.

2 PAPERS • NO BENCHMARKS YET

WOS

WOS (Web of Science Dataset)

Web of Science (WOS) is a document classification dataset that contains 46,985 documents with 134 categories which include 7 parents categories.

48 PAPERS • 4 BENCHMARKS

WebVision

The WebVision dataset is designed to facilitate the research on learning visual representation from noisy web data. It is a large scale web images dataset that contains more than 2.4 million of images crawled from the Flickr website and Google Images search.

170 PAPERS • 4 BENCHMARKS

YFCC100M

YFCC100M is a that dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014.

224 PAPERS • NO BENCHMARKS YET

YFCC100M Fine-Grained Geolocation

The YFCC100M Fine-Grained Geolocation dataset is a subset of 100 a set of 36,146 YFCC100M images that had Flickr tags that could be identified as corresponding to one of the labels in the iNaturalist 2017 dataset. The 36,146 images that were selected so have the following characteristics: the image must have geolocation available, the image must have at most one iNaturalist label, at most ten examples were retained for each label.

1 PAPER • NO BENCHMARKS YET

fMoW (Functional Map of the World)

Functional Map of the World (fMoW) is a dataset that aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features.

108 PAPERS • NO BENCHMARKS YET

iCartoonFace

The iCartoonFace dataset is a large-scale dataset that can be used for two different tasks: cartoon face detection and cartoon face recognition.

7 PAPERS • 1 BENCHMARK

iNaturalist Fine-Grained Geolocation

The iNaturalist Fine-Grained Geolocation dataset is an extension of the iNaturalist dataset with complementary geolocation information.

1 PAPER • NO BENCHMARKS YET

Datasets

238 dataset results for Image Classification