The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
15,627 PAPERS • 108 BENCHMARKS
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. There are 600 images per class. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 500 training images and 100 testing images per class.
8,697 PAPERS • 58 BENCHMARKS
The Food-101 dataset consists of 101 food categories with 750 training and 250 test images per category, making a total of 101k images. The labels for the test images have been manually cleaned, while the training set contains some noise.
751 PAPERS • 15 BENCHMARKS
VoxCeleb1 is an audio dataset containing over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.
666 PAPERS • 10 BENCHMARKS
Clothing1M contains 1M clothing images in 14 classes. It is a dataset with noisy labels, since the data is collected from several online shopping websites and include many mislabelled samples. This dataset also contains 50k, 14k, and 10k images with clean labels for training, validation, and testing, respectively.
285 PAPERS • 4 BENCHMARKS
The WebVision dataset is designed to facilitate the research on learning visual representation from noisy web data. It is a large scale web images dataset that contains more than 2.4 million of images crawled from the Flickr website and Google Images search.
177 PAPERS • 4 BENCHMARKS
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N), equipping the training dataset of CIFAR-10 and CIFAR-100 with human-annotated real-world noisy labels that we collect from Amazon Mechanical Turk.
98 PAPERS • 6 BENCHMARKS
65 PAPERS • 1 BENCHMARK
Chaoyang dataset contains 1111 normal, 842 serrated, 1404 adenocarcinoma, 664 adenoma, and 705 normal, 321 serrated, 840 adenocarcinoma, 273 adenoma samples for training and testing, respectively. This noisy dataset is constructed in the real scenario.
17 PAPERS • 2 BENCHMARKS
10 classes with 50, 000 training and 5, 000 testing images. Please note that, in ANIMAL10N, noisy labels were injected naturally by human mistakes, where its noise rate was estimated at 8%.
15 PAPERS • 1 BENCHMARK
Approx. 300,000 images of galaxies labelled by shape.
6 PAPERS • 1 BENCHMARK
Part of the Controlled Noisy Web Labels Dataset.
5 PAPERS • 2 BENCHMARKS
COCO-N Medium introduces a stochastic benchmark that simulates common real-world scenarios with noticeable label inaccuracies in the COCO dataset. This benchmark combines class and spatial noises to create a challenging yet realistic evaluation framework for instance segmentation models. It mimics datasets manually annotated by crowd workers, where a moderate level of label noise is expected. By incorporating both class and spatial inaccuracies, COCO-N Medium allows researchers to assess their models' basic robustness to label noise, providing insights into performance in typical real-world applications where perfect annotations are rare. This medium-level benchmark serves as a crucial middle ground, offering a more rigorous test than minimally noisy datasets while remaining within the bounds of commonly encountered data quality issues. COCO-N Medium enables a nuanced evaluation of model performance under realistic conditions, helping identify areas for improvement in handling noisy la
1 PAPER • 1 BENCHMARK
The COCO-WAN benchmark is designed to assess the impact of weakly annotations (combined with auto-annotation tools) noise on instance segmentation models. This benchmark is built upon the COCO dataset and incorporates noise generated through weak annotations, simulating real-world scenarios where annotations might be imperfect due to semi-automated tools. It includes various levels of noise to challenge the robustness and generalization capabilities of segmentation models.