DIV2K is a popular single-image super-resolution dataset which contains 1,000 images with different scenes and is splitted to 800 for training, 100 for validation and 100 for testing. It was collected for NTIRE2017 and NTIRE2018 Super-Resolution Challenges in order to encourage research on image super-resolution with more realistic degradation. This dataset contains low resolution images with different types of degradations. Apart from the standard bicubic downsampling, several types of degradations are considered in synthesizing low resolution images for different tracks of the challenges. Track 2 of NTIRE 2017 contains low resolution images with unknown x4 downscaling. Track 2 and track 4 of NTIRE 2018 correspond to realistic mild ×4 and realistic wild ×4 adverse conditions, respectively. Low-resolution images under realistic mild x4 setting suffer from motion blur, Poisson noise and pixel shifting. Degradations under realistic wild x4 setting are further extended to be of different
621 PAPERS • 3 BENCHMARKS
SIDD is an image denoising dataset containing 30,000 noisy images from 10 scenes under different lighting conditions using five representative smartphone cameras. Ground truth images are provided along with the noisy images.
235 PAPERS • 2 BENCHMARKS
The See-in-the-Dark (SID) dataset contains 5094 raw short-exposure images, each with a corresponding long-exposure reference image. Images were captured using two cameras: Sony α7SII and Fujifilm X-T2.
144 PAPERS • 3 BENCHMARKS
Color BSD68 dataset for image denoising benchmarks is part of The Berkeley Segmentation Dataset and Benchmark. It is used for measuring image denoising algorithms performance. It contains 68 images.
134 PAPERS • 16 BENCHMARKS
The BirdSong dataset consists of audio recordings of bird songs at the H. J. Andrews (HJA) Experimental Forest, using unattended microphones. The goal of the dataset is to provide data to automatically identify the species of bird responsible for each utterance in these recordings. The dataset contains 548 10-seconds audio recordings.
33 PAPERS • NO BENCHMARKS YET
PolyU Dataset is a large dataset of real-world noisy images with reasonably obtained corresponding “ground truth” images. The basic idea is to capture the same and unchanged scene for many (e.g., 500) times and compute their mean image, which can be roughly taken as the “ground truth” image for the real-world noisy images. The rational of this strategy is that for each pixel, the noise is generated randomly larger or smaller than 0. Sampling the same pixel many times and computing the average value will approximate the truth pixel value and alleviate significantly the noise.
30 PAPERS • 1 BENCHMARK
The CRVD dataset consists of 55 groups of noisy-clean videos with ISO values ranging from 1600 to 25600.
24 PAPERS • 1 BENCHMARK
Benchmarking Denoising Algorithms with Real Photographs
24 PAPERS • 2 BENCHMARKS
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician, eugenicist, and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspé Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus".
18 PAPERS • 8 BENCHMARKS
The Fluorescence Microscopy Denoising (FMD) dataset is dedicated to Poisson-Gaussian denoising. The dataset consists of 12,000 real fluorescence microscopy images obtained with commercial confocal, two-photon, and wide-field microscopes and representative biological samples such as cells, zebrafish, and mouse brain tissues. Image averaging is used to effectively obtain ground truth images and 60,000 noisy images with different noise levels.
16 PAPERS • 2 BENCHMARKS
the dataset contains data about hydrogen storage in metal hydrides
13 PAPERS • 2 BENCHMARKS
An open dataset of real photographs with real noise, from identical scenes captured with varying ISO values. Most images are taken with a Fujifilm X-T1 and XF18-55mm, other photographers are encouraged to contribute images for a more diverse crowdsourced effort.
8 PAPERS • NO BENCHMARKS YET
Maximilian B. Kiss, Sophia B. Coban, K. Joost Batenburg, Tristan van Leeuwen, and Felix Lucka "2DeteCT - A large 2D expandable, trainable, experimental Computed Tomography dataset for machine learning", Sci Data 10, 576 (2023) or arXiv:2306.05907 (2023)
5 PAPERS • NO BENCHMARKS YET
The Raider dataset collects fMRI recordings of 1000 voxels from the ventral temporal cortex, for 10 healthy adult participants passively watching the full-length movie “Raiders of the Lost Ark”.
CommitBART is a benchmark for researching commit-related task such as denoising, cross-modal generation and contrastive learning. The dataset contains over 7 million commits across 7 programming languages.
3 PAPERS • NO BENCHMARKS YET
S2TLD is a traffic light dataset, which contains 5,786 images of approximately 1,080 * 1,920 pixels and 720 * 1,280 pixels. It also contains 5 categories (include red, yellow, green, off and wait on) of 1,4130 instances. The scenes cover a decent variety of road scenes and typical: * Busy street scenes inner-city, * Dense stop-and-go traffic * Strong changes in illumination/exposure * Flickering/Fluctuating traffic lights * Multiple visible traffic lights * Image parts that can be confused with traffic lights (e.g. large round tail lights)
The dataset has 10.5 hours from a single speaker.
2 PAPERS • NO BENCHMARKS YET
An open data corpus of 123.610 labeled samples,
1 PAPER • NO BENCHMARKS YET
Synthetic training set: This set is constructed in the following two steps and will be used for estimation/training purposes. i) 84,000 275 pixel x 400 pixel ground-truth fingerprint images without any noise or scratches, but with random transformations (at most five pixels translation and +/-10 degrees rotation) were generated by using the software Anguli: Synthetic Fingerprint Generator. ii) 84,000 275 pixel x 400 pixel degraded fingerprint images were generated by applying random artifacts (blur, brightness, contrast, elastic transformation, occlusion, scratch, resolution, rotation) and backgrounds to the ground-truth fingerprint images. In total, it contains 168,000 fingerprint images (84,000 fingerprints, and two impressions - one ground-truth and one degraded - per fingerprint).
Advanced pixel shift technology is employed to perform a full color sampling of the image. Pixel shift technology takes four samples of the same image at nearly the same time, and physically controls the camera sensor to move one pixel horizontally or vertically at each sampling to capture all color information at each pixel. The pixel shift technology ensures that the sampled images follow the distribution of natural images sampled by the camera, and the full information of the color (R, Gr, Gb, B channel) is completely obtained without any need of interpolation. In this way, the collected RGB images are artifacts-free, which leads to better training results for demosaicing related tasks.
The PointDenoisingBenchmark dataset features 28 different shapes, split into 18 training shapes and 10 test shapes.
The Raw Natural Image Noise Dataset (RawNIND) is a diverse collection of paired raw images designed to support the development of denoising models that generalize across sensors, image development workflows, and styles.
We propose a new light field image database called “PINet” inheriting the hierarchical structure from WordNet. It consists of 7549 LIs captured by Lytro Illum, which is much larger than the existing databases. The images are manually annotated to 178 categories according to WordNet, such as cat, camel, bottle, fans, etc. The registered depth maps are also provided. Each image is generated by processing the raw LI from the camera by Light Field Toolbox v0.4 for demosaicing and devignetting.
0 PAPER • NO BENCHMARKS YET