The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
11,816 PAPERS • 111 BENCHMARKS
Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST shares the same image size, data format and the structure of training and testing splits with the original MNIST.
2,456 PAPERS • 18 BENCHMARKS
Office-Home is a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art – artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart – collection of clipart images; Product – images of objects without a background and Real-World – images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.
765 PAPERS • 11 BENCHMARKS
PACS is an image dataset for domain generalization. It consists of four domains, namely Photo (1,670 images), Art Painting (2,048 images), Cartoon (2,344 images) and Sketch (3,929 images). Each domain contains seven categories.
458 PAPERS • 7 BENCHMARKS
DomainNet is a dataset of common objects in six different domain. All domains include 345 categories (classes) of objects such as Bracelet, plane, bird and cello. The domains include clipart: collection of clipart images; real: photos and real world images; sketch: sketches of specific objects; infograph: infographic images with specific object; painting artistic depictions of objects in the form of paintings and quickdraw: drawings of the worldwide players of game “Quick Draw!”.
455 PAPERS • 10 BENCHMARKS
ImageNet-C is an open source data set that consists of algorithmically generated corruptions (blur, noise) applied to the ImageNet test-set.
398 PAPERS • 3 BENCHMARKS
The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.
235 PAPERS • 5 BENCHMARKS
ImageNet-R(endition) contains art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes.
226 PAPERS • 4 BENCHMARKS
ImageNet-Sketch data set consists of 50000 images, 50 images for each of the 1000 ImageNet classes. The data set is constructed with Google Image queries "sketch of ", where is the standard class name. Only within the "black and white" color scheme is searched. 100 images are initially queried for every class, and the pulled images are cleaned by deleting the irrelevant images and images that are for similar but different classes. For some classes, there are less than 50 images after manually cleaning, and then the data set is augmented by flipping and rotating the images.
141 PAPERS • 3 BENCHMARKS
The Stylized-ImageNet dataset is created by removing local texture cues in ImageNet while retaining global shape information on natural images via AdaIN style transfer. This nudges CNNs towards learning more about shapes and less about local textures.
85 PAPERS • 1 BENCHMARK
WildDash is a benchmark evaluation method is presented that uses the meta-information to calculate the robustness of a given algorithm with respect to the individual hazards.
39 PAPERS • 2 BENCHMARKS
ImageNet-P consists of noise, blur, weather, and digital distortions. The dataset has validation perturbations; has difficulty levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 × 64, standard, and Inception-sized editions; and has been designed for benchmarking not training networks. ImageNet-P departs from ImageNet-C by having perturbation sequences generated from each ImageNet validation image. Each sequence contains more than 30 frames, so to counteract an increase in dataset size and evaluation time only 10 common perturbations are used.
26 PAPERS • 1 BENCHMARK
Our goal is to improve upon the status quo for designing image classification models trained in one domain that perform well on images from another domain. Complementing existing work in robustness testing, we introduce the first test dataset for this purpose which comes from an authentic use case where photographers wanted to learn about the content in their images. We built a new test set using 8,900 images taken by people who are blind for which we collected metadata to indicate the presence versus absence of 200 ImageNet object categories. We call this dataset VizWiz-Classification.
20 PAPERS • 1 BENCHMARK
VLCS is a dataset to test for domain generalization.
19 PAPERS • NO BENCHMARKS YET
The goal of NICO Challenge is to facilitate the OOD (Out-of-Distribution) generalization in visual recognition through promoting the research on the intrinsic learning mechanisms with native invariance and generalization ability. The training data is a mixture of several observed contexts while the test data is composed of unseen contexts. Participants are tasked with developing reliable algorithms across different contexts (domains) to improve the generalization ability of models.
15 PAPERS • NO BENCHMARKS YET
The MSK dataset is a dataset for lesion recognition from the Memorial Sloan-Kettering Cancer Center. It is used as part of the ISIC lesion recognition challenges.
11 PAPERS • NO BENCHMARKS YET
I.I.D. hypothesis between training and testing data is the basis of numerous image classification methods. Such property can hardly be guaranteed in practice where the Non-IIDness is common, causing in- stable performances of these models. In literature, however, the Non-I.I.D. image classification problem is largely understudied. A key reason is lacking of a well-designed dataset to support related research. In this paper, we construct and release a Non-I.I.D. image dataset called NICO, which uses contexts to create Non-IIDness consciously. Compared to other datasets, extended analyses prove NICO can support various Non-I.I.D. situations with sufficient flexibility. Meanwhile, we propose a baseline model with Con- vNet structure for General Non-I.I.D. image classification, where distribution of testing data is unknown but different from training data. The experimental results demonstrate that NICO can well support the training of ConvNet model from scratch, and a batch balancing modul
8 PAPERS • 2 BENCHMARKS
The Sims4Action Dataset: a videogame-based dataset for Synthetic→Real domain adaptation for human activity recognition.
5 PAPERS • NO BENCHMARKS YET
This prostate MRI segmentation dataset is collected from six different data sources.
3 PAPERS • NO BENCHMARKS YET
Wild-Time is a benchmark of 5 datasets that reflect temporal distribution shifts arising in a variety of real-world applications, including patient prognosis and news classification. On these datasets, we systematically benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning.
Caltech Fish Counting Dataset (CFC) is a large-scale dataset for detecting, tracking, and counting fish in sonar videos. This dataset contains over 1,500 videos sourced from seven different sonar cameras.
1 PAPER • NO BENCHMARKS YET
Super-CLEVR is a dataset for Visual Question Answering (VQA) where different factors in VQA domain shifts can be isolated in order that their effects can be studied independently. It contains 21 vehicle models belonging to 5 categories, with controllable attributes. Four factors are considered: visual complexity, question redundancy, concept distribution and concept compositionality.
The Tufts fNIRS to Mental Workload (fNIRS2MW) open-access dataset is a new dataset for building machine learning classifiers that can consume a short window (30 seconds) of multivariate fNIRS recordings and predict the mental workload intensity of the user during that window.
0 PAPER • NO BENCHMARKS YET