The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
6,989 PAPERS • 52 BENCHMARKS
Street View House Numbers (SVHN) is a digit classification benchmark dataset that contains 600,000 32×32 RGB images of printed digits (from 0 to 9) cropped from pictures of house number plates. The cropped images are centered in the digit of interest, but nearby digits and other distractors are kept in the image. SVHN has three sets: training, testing sets and an extra set with 530,000 images that are less difficult and can be used for helping with the training process.
3,087 PAPERS • 12 BENCHMARKS
Office-Home is a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art – artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart – collection of clipart images; Product – images of objects without a background and Real-World – images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.
940 PAPERS • 11 BENCHMARKS
DomainNet is a dataset of common objects in six different domain. All domains include 345 categories (classes) of objects such as Bracelet, plane, bird and cello. The domains include clipart: collection of clipart images; real: photos and real world images; sketch: sketches of specific objects; infograph: infographic images with specific object; painting artistic depictions of objects in the form of paintings and quickdraw: drawings of the worldwide players of game “Quick Draw!”.
609 PAPERS • 10 BENCHMARKS
The Office dataset contains 31 object categories in three domains: Amazon, DSLR and Webcam. The 31 categories in the dataset consist of objects commonly encountered in office settings, such as keyboards, file cabinets, and laptops. The Amazon domain contains on average 90 images per class and 2817 images in total. As these images were captured from a website of online merchants, they are captured against clean background and at a unified scale. The DSLR domain contains 498 low-noise high resolution images (4288×2848). There are 5 objects per category. Each object was captured from different viewpoints on average 3 times. For Webcam, the 795 images of low resolution (640×480) exhibit significant noise and color as well as white balance artifacts.
593 PAPERS • 7 BENCHMARKS
PACS is an image dataset for domain generalization. It consists of four domains, namely Photo (1,670 images), Art Painting (2,048 images), Cartoon (2,344 images) and Sketch (3,929 images). Each domain contains seven categories.
566 PAPERS • 7 BENCHMARKS
The SYNTHIA dataset is a synthetic dataset that consists of 9400 multi-viewpoint photo-realistic frames rendered from a virtual city and comes with pixel-level semantic annotations for 13 classes. Each frame has resolution of 1280 × 960.
501 PAPERS • 10 BENCHMARKS
USPS is a digit dataset automatically scanned from envelopes by the U.S. Postal Service containing a total of 9,298 16×16 pixel grayscale samples; the images are centered, normalized and show a broad range of font styles.
417 PAPERS • 4 BENCHMARKS
The GTA5 dataset contains 24966 synthetic images with pixel level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. There are 19 semantic classes which are compatible with the ones of Cityscapes dataset.
379 PAPERS • 7 BENCHMARKS
The German Traffic Sign Recognition Benchmark (GTSRB) contains 43 classes of traffic signs, split into 39,209 training images and 12,630 test images. The images have varying light conditions and rich backgrounds.
317 PAPERS • 3 BENCHMARKS
The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean dense geometry, high resolution and high dynamic range textures, glass and mirror surface information, planar segmentation as well as semantic class and instance segmentation.
284 PAPERS • 3 BENCHMARKS
Foggy Cityscapes is a synthetic foggy dataset which simulates fog on real scenes. Each foggy image is rendered with a clear image and depth map from Cityscapes. Thus the annotations and data split in Foggy Cityscapes are inherited from Cityscapes.
207 PAPERS • 6 BENCHMARKS
VisDA-2017 is a simulation-to-real dataset for domain adaptation with over 280,000 images across 12 categories in the training, validation and testing domains. The training images are generated from the same object under different circumstances, while the validation images are collected from MSCOCO..
206 PAPERS • 6 BENCHMARKS
ImageNet-Sketch data set consists of 50,889 images, approximately 50 images for each of the 1000 ImageNet classes. The data set is constructed with Google Image queries "sketch of ", where is the standard class name. Only within the "black and white" color scheme is searched. 100 images are initially queried for every class, and the pulled images are cleaned by deleting the irrelevant images and images that are for similar but different classes. For some classes, there are less than 50 images after manually cleaning, and then the data set is augmented by flipping and rotating the images.
205 PAPERS • 3 BENCHMARKS
MNIST-M is created by combining MNIST digits with the patches randomly extracted from color photos of BSDS500 as their background. It contains 59,001 training and 90,001 test images.
180 PAPERS • 1 BENCHMARK
The ImageCLEF-DA dataset is a benchmark dataset for ImageCLEF 2014 domain adaptation challenge, which contains three domains: Caltech-256 (C), ImageNet ILSVRC 2012 (I) and Pascal VOC 2012 (P). For each domain, there are 12 categories and 50 images in each category.
92 PAPERS • 5 BENCHMARKS
The VGG Face dataset is face identity recognition dataset that consists of 2,622 identities. It contains over 2.6 million images.
88 PAPERS • NO BENCHMARKS YET
IDD is a dataset for road scene understanding in unstructured environments used for semantic segmentation and object detection for autonomous driving. It consists of 10,004 images, finely annotated with 34 classes collected from 182 drive sequences on Indian roads.
86 PAPERS • NO BENCHMARKS YET
SIM10k is a synthetic dataset containing 10,000 images, which is rendered from the video game Grand Theft Auto V (GTA5).
71 PAPERS • 3 BENCHMARKS
5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan Focus on different geographical environments between Urban and Rural Advance both semantic segmentation and domain adaptation tasks Three considerable challenges: Multi-scale objects Complex background samples Inconsistent class distributions
45 PAPERS • 1 BENCHMARK
Synscapes is a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis.
42 PAPERS • 1 BENCHMARK
AVD focuses on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes.
29 PAPERS • 1 BENCHMARK
Comic2k is a dataset used for cross-domain object detection which contains 2k comic images with image and instance-level annotations. Image Source: https://naoto0804.github.io/cross_domain_detection/
27 PAPERS • 7 BENCHMARKS
VIDIT is a reference evaluation benchmark and to push forward the development of illumination manipulation methods. VIDIT includes 390 different Unreal Engine scenes, each captured with 40 illumination settings, resulting in 15,600 images. The illumination settings are all the combinations of 5 color temperatures (2500K, 3500K, 4500K, 5500K and 6500K) and 8 light directions (N, NE, E, SE, S, SW, W, NW). Original image resolution is 1024x1024.
20 PAPERS • 1 BENCHMARK
VehicleX is a large-scale synthetic dataset. Created in Unity, it contains 1,362 vehicles of various 3D models with fully editable attributes.
16 PAPERS • NO BENCHMARKS YET
REFUGE Challenge provides a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one.
13 PAPERS • 5 BENCHMARKS
CASIA V2 is a dataset for forgery classification. It contains 4795 images, 1701 authentic and 3274 forged.
12 PAPERS • NO BENCHMARKS YET
So2Sat LCZ42 consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months.
11 PAPERS • 1 BENCHMARK
Adaptiope is a domain adaptation dataset with 123 classes in the three domains synthetic, product and real life. One of the main goals of Adaptiope is to offer a clean and well curated set of images for domain adaptation. This was necessary as many other common datasets in the area suffer from label noise and low quality images. Additionally, Adaptiope's class set was chosen in a way that minimizes the overlap with the class set of the commonly used ImageNet pretraining, therefore preventing information leakage in a domain adaptation setup.
9 PAPERS • NO BENCHMARKS YET
Office-Caltech-10 a standard benchmark for domain adaptation, which consists of Office 10 and Caltech 10 datasets. It contains the 10 overlapping categories between the Office dataset and Caltech256 dataset. SURF BoW historgram features, vector quantized to 800 dimensions are also available for this dataset.
9 PAPERS • 1 BENCHMARK
Under a close collaboration with an expert radiologist team of the Hospital Universitario San Cecilio, the COVIDGR-1.0 dataset of patients' anonymized X-ray images has been built. 852 images have been collected following a strict labeling protocol. They are categorized into 426 positive cases and 426 negative cases. Positive images correspond to patients who have been tested positive for COVID-19 using RT-PCR within a time span of at most 24h between the X-ray image and the test. Every image has been taken using the same type of equipment and with the same format: only the posterior-anterior view is considered.
7 PAPERS • NO BENCHMARKS YET
The Cross-dataset Testbed is a Decaf7 based cross-dataset image classification dataset, which contains 40 categories of images from 3 domains: 3,847 images in Caltech256, 4,000 images in ImageNet, and 2,626 images for SUN. In total there are 10,473 images of 40 categories from these three domains.
Modern Office-31 is a refurbished version of the commonly used Office-31 dataset. Modern Office-31 rectifies many of the annotation errors and low quality images in the Amazon domain of the original Office-31 dataset. Additionally, this dataset adds another synthetic domain based on the Adaptiope dataset.
6 PAPERS • NO BENCHMARKS YET
CocoDoom is a collection of pre-recorded data extracted from Doom gaming sessions along with annotations in the MS Coco format.
4 PAPERS • NO BENCHMARKS YET
The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.
This dataset contains 114 individuals including 1824 images captured from two disjoint camera views. For each person, eight images are captured from eight different orientations under one camera view and are normalized to 128x48 pixels. This dataset is also split into two parts randomly. One contains 57 individuals for training, and the other contains 57 individuals for testing.
4 PAPERS • 1 BENCHMARK
We design an all-day semantic segmentation benchmark all-day CityScapes. It is the first semantic segmentation benchmark that contains samples from all-day scenarios, i.e., from dawn to night. Our dataset will be made publicly available at [https://isis-data.science.uva.nl/cv/1ADcityscape.zip].
3 PAPERS • 1 BENCHMARK
Unsupervised Domain Adaptation demonstrates great potential to mitigate domain shifts by transferring models from labeled source domains to unlabeled target domains. While Unsupervised Domain Adaptation has been applied to a wide variety of complex vision tasks, only few works focus on lane detection for autonomous driving. This can be attributed to the lack of publicly available datasets. To facilitate research in these directions, we propose CARLANE, a 3-way sim-to-real domain adaptation benchmark for 2D lane detection. CARLANE encompasses the single-target datasets MoLane and TuLane and the multi-target dataset MuLane. These datasets are built from three different domains, which cover diverse scenes and contain a total of 163K unique images, 118K of which are annotated. In addition we evaluate and report systematic baselines, including our own method, which builds upon Prototypical Cross-domain Self-supervised Learning. We find that false positive and false negative rates of the eva
3 PAPERS • 3 BENCHMARKS
The Five-Billion-Pixels dataset contains more than 5 billion labeled pixels of 150 high-resolution Gaofen-2 (4 m) satellite images, annotated in a 24-category system covering artificial-constructed, agricultural, and natural classes. It possesses the advantage of rich categories, large coverage, wide distribution, and high-spatial resolution, which well reflects the distributions of real-world ground objects and can benefit to different land cover related studies.
3 PAPERS • NO BENCHMARKS YET
5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images
2 PAPERS • 2 BENCHMARKS
Mila Simulated Floods Dataset is a 1.5 square km virtual world using the Unity3D game engine including urban, suburban and rural areas.
2 PAPERS • 1 BENCHMARK
NHA12D is an annotated pavement crack dataset that contains images with different viewpoints and pavements types. This dataset is composed of 80 pavement images, including 40 concrete pavement images and 40 asphalt pavement images, captured by digital survey vehicles on the A12 network in the UK.
2 PAPERS • NO BENCHMARKS YET
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.
The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.
1 PAPER • NO BENCHMARKS YET
The goal of this project is to present two new datasets that seek to expand the capability of the Learning to See in the Dark Low-light enhancement CNN for the Canon 6D DSLR, and explore how the network performs when modified in various ways, both pruning it and making it deeper.
1 PAPER • 2 BENCHMARKS
Dataset release for the BMVC 2021 Paper "Few-Shot Domain Adaptation for Low Light RAW Image Enhancement"
Open MIC (Open Museum Identification Challenge) contains photos of exhibits captured in 10 distinct exhibition spaces of several museums which showcase paintings, timepieces, sculptures, glassware, relics, science exhibits, natural history pieces, ceramics, pottery, tools and indigenous crafts. The goal of Open MIC is to stimulate research in domain adaptation, egocentric recognition and few-shot learning by providing a testbed complementary to the famous Office 31.
Our proposed Synthetic-to-Real benchmark for more practical visual DA (termed S2RDA) includes two challenging transfer tasks of S2RDA-49 and S2RDA-MS-39. In each task, source/synthetic domain samples are synthesized by rendering 3D models from ShapeNet. The used 3D models are in the same label space as the target/real domain and each class has 12K rendered RGB images. The real domain of S2RDA-49 comprises 60,535 images of 49 classes, collected from ImageNet validation set, ObjectNet, VisDA-2017 validation set, and the web. For S2RDA-MS-39, the real domain collects 41,735 natural images exclusive for 39 classes from MetaShift, which contain complex and distinct contexts, e.g., object presence (co-occurrence of different objects), general contexts (indoor or outdoor), and object attributes (color or shape), leading to a much harder task. Compared to VisDA-2017, our S2RDA contains more categories, more realistically synthesized source domain data coming for free, and more complicated targ