The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits.
6,990 PAPERS • 52 BENCHMARKS
The Moving MNIST dataset contains 10,000 video sequences, each consisting of 20 frames.
182 PAPERS • 1 BENCHMARK
The Stacked MNIST dataset is derived from the standard MNIST dataset with an increased number of discrete modes. 240,000 RGB images in the size of 32×32 are synthesized by stacking three random digit images from MNIST along the color channel, resulting in 1,000 explicit modes in a uniform distribution corresponding to the number of possible triples of digits.
43 PAPERS • 1 BENCHMARK
Permuted MNIST is an MNIST variant that consists of 70,000 images of handwritten digits from 0 to 9, where 60,000 images are used for training, and 10,000 images for test. The difference of this dataset from the original MNIST is that each of the ten tasks is the multi-class classification of a different random permutation of the input pixels.
111 PAPERS • 2 BENCHMARKS
Colored MNIST is a synthetic binary classification task derived from MNIST.
175 PAPERS • NO BENCHMARKS YET
…Mechanical MNIST is generated by first converting the MNIST bitmap images (http://www.pymvpa.org/datadb/mnist.html) to 2D heterogeneous blocks of material. Consistent with the MNIST bitmap ($28 \times 28$ pixels), the material domain is a $28 \times 28$ unit square. The code to reproduce these simulations is hosted on GitHub (https://github.com/elejeune11/Mechanical-MNIST/tree/master/generate_dataset). The paper "Mechanical MNIST: A benchmark dataset for mechanical metamodels" can be found at https://doi.org/10.1016/j.eml.2020.100659. All code necessary to reproduce the metamodels demonstrated in the manuscript is available on GitHub (https://github.com/elejeune11/Mechanical-MNIST).
2 PAPERS • NO BENCHMARKS YET
A binarized version of MNIST.
10 PAPERS • 1 BENCHMARK
MNIST Multiview Datasets MNIST is a publicly available dataset consisting of 70, 000 images of handwritten digits distributed over ten classes. We generated 2 four-view datasets where each view is a vector of R<sup>14 x 14</sup>: MNIST<sub>1</sub>: It is generated by considering 4 quarters of image as 4 views. MNIST<sub>2</sub>: It is generated by considering 4 overlapping views around the centre of images: this dataset brings redundancy between the views.
1 PAPER • NO BENCHMARKS YET
…MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge.
40 PAPERS • NO BENCHMARKS YET
MNIST-M is created by combining MNIST digits with the patches randomly extracted from color photos of BSDS500 as their background. It contains 59,001 training and 90,001 test images.
180 PAPERS • 1 BENCHMARK
Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. Fashion-MNIST shares the same image size, data format and the structure of training and testing splits with the original MNIST.
2,788 PAPERS • 17 BENCHMARKS
DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. AmbiguousMNIST contains additional ambiguous digits with varying ambiguity. The AmbiguousMNIST test set contains 60k ambiguous samples as well. Additional Guidance DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. Pick your initial training samples (for warm starting Active Learning) from the MNIST half of DirtyMNIST to avoid starting training with potentially very ambiguous samples, which might add a lot of variance Make sure to pick your validation set from the MNIST half as well, for the same reason as above. If you want to split Ambiguous-MNIST into subsets (or Dirty-MNIST within the second ambiguous half), make sure to split by multiples of 10 to avoid splits within a flattened multi-label sample.
3 PAPERS • NO BENCHMARKS YET
…The training examples are 20 times smaller than MNIST examples yet they differentiate more clearly between linear, nonlinear, and convolutional models which attain 32, 68, and 94% accuracy respectively (these models obtain 94, 99+, and 99+% on MNIST).
8 PAPERS • NO BENCHMARKS YET
A simple dataset consisting of three geometric shapes (Triangle, Rectangle, Ellipsoid) of similar sizes but different orientations.
Brief Description The Neuromorphic-MNIST (N-MNIST) dataset is a spiking version of the original frame-based MNIST dataset. It consists of the same 60 000 training and 10 000 testing samples as the original MNIST dataset, and is captured at the same visual scale as the original MNIST dataset (28x28 pixels). The N-MNIST dataset was captured by mounting the ATIS sensor on a motorized pan-tilt unit and having the sensor move while it views MNIST examples on an LCD monitor as shown in this video.
13 PAPERS • 1 BENCHMARK
The MultiMNIST dataset is generated from MNIST. The training and tests are generated by overlaying a digit on top of another digit from the same set (training or test) but different class. For each digit in the MNIST dataset 1,000 MultiMNIST examples are generated, so the training set size is 60M and the test set size is 10M.
47 PAPERS • 1 BENCHMARK
…The first modality corresponds to 28 × 28 MNIST images, with 75% of their energy removed by PCA. The audio modality is made of audio samples on which we have computed 112 × 112 spectrograms. Contaminated audio samples are randomly paired, accordingly with labels, with MNIST digits in order to reach 55,000 pairs for training and 10,000 pairs for testing.
The Cifar10Mnist dataset is created using CIFAR-10 and MNIST data sources. Since the CIFAR-10 training set consists of 50000 images and the MNIST training set contains 60000 digits, the first 50000 digits from MNIST are padded on top of the CIFAR-10 images after making them slightly Furthermore, the remaining 10000 MNIST digits are padded on top of 10000 random CIFAR10 images (with a fixed seed). This gives the possibility of having a second training dataset of 60000 images. For the test set, the 10000 CIFAR-10 images are padded over the 10000 MNIST digits.
The Mechanical MNIST Crack Path dataset contains Finite Element simulation results from phase-field models of quasi-static brittle fracture in heterogeneous material domains subjected to prescribed loading The heterogeneous material distribution is obtained by adding rigid circular inclusions to the domain using the Fashion MNIST bitmaps as the reference location for the center of the inclusions. Specifically, each center point location is generated randomly inside a square region defined by the corresponding Fashion MNIST pixel when the pixel has an intensity value higher than $10$.
MNIST-MIX is a multi-language handwritten digit recognition dataset. It contains digits from 10 different languages.
Common corruptions dataset for MNIST.
45 PAPERS • NO BENCHMARKS YET
MNIST8M is derived from the MNIST dataset by applying random deformations and translations to the dataset.
26 PAPERS • NO BENCHMARKS YET
Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images). Since MNIST restricts us to 10 classes, the authors chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST. Kuzushiji is a Japanese cursive writing style.
82 PAPERS • 2 BENCHMARKS
A set of synthetic MNIST-style datasets for four orthographies used in Afro-Asiatic and Niger-Congo languages: Ge`ez (Ethiopic), Vai, Osmanya, and N'Ko. These datasets serve as "drop-in" replacements for MNIST.
The Kannada-MNIST dataset is a drop-in substitute for the standard MNIST dataset for the Kannada language.
7 PAPERS • NO BENCHMARKS YET
We introduce the Oracle-MNIST dataset, comprising of 2828 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The dataset is freely available at https://github.com/wm-bupt/oracle-mnist.
Typography-MNIST is a dataset comprising of 565,292 MNIST-style grayscale images representing 1,812 unique glyphs in varied styles of 1,355 Google-fonts.
The dataset is based on the original MNIST dataset. Compared to the original dataset, the digits are scaled down by a factor of $0.75$ such that there is more space for the random translation.The PolyMNIST consists of 5 different modalities. An additional difficulty compared to the original PolyMNIST is the random translation of the digits
10 PAPERS • NO BENCHMARKS YET
The MNIST Large Scale dataset is based on the classic MNIST dataset, but contains large scale variations up to a factor of 16. The dataset contains training data for each one of the relative size factors 1, 2 and 4 relative to the original MNIST dataset and testing data for relative scaling factors between 1/2 and 8, with a ratio
4 PAPERS • 1 BENCHMARK
CI-MNIST (Correlated and Imbalanced MNIST) is a variant of MNIST dataset with introduced different types of correlations between attributes, dataset features, and an artificial eligibility criterion.
4 PAPERS • NO BENCHMARKS YET
The Mechanical MNIST – Distribution Shift dataset contains the results of finite element simulation of heterogeneous material subject to large deformation due to equibiaxial extension at a fixed boundary The Mechanical MNIST dataset is generated by converting the MNIST bitmap images (28x28 pixels) with range 0 - 255 to 2D heterogeneous blocks of material (28x28 unit square) with varying modulus in range The original bitmap images are sourced from the MNIST Digits dataset, (http://www.pymvpa.org/datadb/mnist.html) which corresponds to Mechanical MNIST – MNIST, and the EMNIST Letters dataset (https://www.nist.gov /itl/products-and-services/emnist-dataset) which correspond to Mechanical MNIST – EMNIST Letters. For each type of data distribution shift, we have one dataset generated from the Mechanical MNIST bitmaps and one from the Mechanical MNIST – EMNIST Letters bitmaps.
N-Digit MNIST is a multi-digit MNIST-like dataset.
5 PAPERS • NO BENCHMARKS YET
We provide multiple human annotations for each test image in Fashion-MNIST. This can be used as soft labels or probabilistic labels instead of the usual hard (single) labels.
DeepFake MNIST+ is a deepfake facial animation dataset. The dataset is generated by a SOTA image animation generator.
MedMNIST v2 is a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks Description and image from: MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification Each subset keeps the same license as that of the source dataset. Please also cite the corresponding paper of source data if you use any subset of MedMNIST.
16 PAPERS • NO BENCHMARKS YET
This is a dataset with spurious correlations which can be used to evaluate machine learning methods for out-of-distribution generalization, causal inference, and related field.
6 PAPERS • 1 BENCHMARK
EMNIST (extended MNIST) has 4 times more data than MNIST. It is a set of handwritten digits with a 28 x 28 format.
234 PAPERS • 9 BENCHMARKS
See paper:
13 PAPERS • 2 BENCHMARKS
The exact pre-processing steps used to construct the MNIST dataset have long been lost. This leaves us with no reliable way to associate its characters with the ID of the writer and little hope to recover the full MNIST testing set that had 60K images but was never released. The official MNIST testing set only contains 10K randomly sampled images and is often considered too small to provide meaningful confidence intervals. The QMNIST dataset was generated from the original data found in the NIST Special Database 19 with the goal to match the MNIST preprocessing as closely as possible.
23 PAPERS • 2 BENCHMARKS
Digits-Five is a collection of five most popular digit datasets, MNIST (mt) (55000 samples), MNIST-M (mm) (55000 samples), Synthetic Digits (syn) (25000 samples), SVHN (sv)(73257 samples), and USPS (up
…These models are trained on CIFAR10, Fashion-MNIST, and MNIST datasets. For each dataset, clean and Trojan models are trained for 4 different architectures. Namely Resent18, VGG19, Densenet, and GoogleNet for CIFAR10 and Fashion-MNIST and 4 custom-designed architectures for MNIST.
Kuzushiji-49 is an MNIST-like dataset that has 49 classes (28x28 grayscale, 270,912 images) from 48 Hiragana characters and one Hiragana iteration mark.
HASY is a dataset of single symbols similar to MNIST. It contains 168,233 instances of 369 classes.
ASCAD (ANSSI SCA Database) is a set of databases that aims at providing a benchmarking reference for the SCA community: the purpose is to have something similar to the MNIST database that the Machine Learning
…Based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, The datasets that are currently available are the following: MNIST, SIREN. CIFAR10, SIREN, MicroImageNet, SIREN. ShapeNet, SIREN. More datasets will be added in the future.
…Different from other datasets such as the moving MNIST dataset, the samples comprise a goal-oriented task as described, making it more suitable for testing prediction capabilities of an ML model.
…Different from the moving MNIST dataset, the samples comprise a goal-oriented task, namely one object has to fully cover the other object rather than randomly moving, making it better suitable for testing