AUT-VI is a super-challenging visual inertial dataset with 126 diverse sequences in 17 locations. This dataset contains dynamic objects, challenging loop-closure/map-reuse, different lighting conditions, reflections, and sudden camera movements to cover all extreme navigation scenarios. Moreover, the Android application for data capture is released to the public to support ongoing development efforts. This dataset aims to exploit the remaining challenges in VIO algorithms, in the hope of improving them to facilitate navigation for visually impaired individuals in both indoor and outdoor settings.
1 PAPER • NO BENCHMARKS YET
The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite greater access to earth observation data in agriculture, there is a scarcity of curated and labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset called SICKLE, which constitutes a time-series of multi-resolution imagery from 3 distinct satellites: Landsat-8, Sentinel-1 and Sentinel-2. Our dataset constitutes multi-spectral, thermal and microwave sensors during January 2018 - March 2021 period. We construct each temporal sequence by considering the cropping practices followed by farmers primarily engaged in paddy cultivation in the Cauvery Delta region of Tamil Nadu, India; and annotate the corresponding imagery with key cropping parameters at multiple resolutions (i.e. 3m, 10m and 30m). Our dataset comprises 2, 370 season-wise samples from 388 unique plots, having
1 PAPER • 5 BENCHMARKS
75k photos of windows + 21k synthetic renders of building windows.
InfraParis is a novel and versatile dataset supporting multiple tasks across three modalities: RGB, depth, and infrared. From the city to the suburbs, it contains a variety of styles in different areas of the greater Paris area, providing rich semantic information. InfraParis contains 7301 images with bounding boxes and full semantic (19 classes) annotations. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.
Synthetic humans generated by the RePoGen method.
A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset
8 PAPERS • 2 BENCHMARKS
This is the first general Underwater Image Instance Segmentation (UIIS) dataset containing 4,628 images for 7 categories with pixel-level annotations for underwater instance segmentation task
1 PAPER • 1 BENCHMARK
ISOD contains 2,000 manually labelled RGB-D images from 20 diverse sites, each featuring over 30 types of small objects randomly placed amidst the items already present in the scenes. These objects, typically ≤3cm in height, include LEGO blocks, rags, slippers, gloves, shoes, cables, crayons, chalk, glasses, smartphones (and their cases), fake banana peels, fake pet waste, and piles of toilet paper, among others. These items were chosen because they either threaten the safe operation of indoor mobile robots or create messes if run over.
The PAX-Ray++ dataset uses pseudo-labeled thorax CTs to enable the segmentation of anatomy in Chest X-Rays. By projecting the CTs to a 2D plane, we gather fine-grained annotated imaages resembling radiographs. It contains 7,377 frontal and lateral view images each with 157 anatomy classes and over 2 million annotated instances.
2 PAPERS • NO BENCHMARKS YET
This dataset is a collection of fluorescent images from mice in order to test an automatic cell counting tool that we developed. 62 images viewed from 2 or 3 different fields of views are shown. In brief, the dataset was derived from brain sections of a model for HIV-induced brain injury (HIVgp120tg), which expresses soluble gp120 envelope protein in astrocytes under the control of a modified GFAP promoter. The mice were in a mixed C57BL/6.129/SJL genetic background, and two genotypes of 9 month old male mice were selected: wild type controls (Resting, n = 3) and transgenic littermates (HIVgp120tg, Activated, n = 3). No randomization was performed. HIVgp120tg mice show among other hallmarks of human HIV neuropathology an increase in microglia numbers which indicates activation of the cells compared to non-transgenic littermate controls.
A dataset of 100K synthetic images of skin lesions, ground-truth (GT) segmentations of lesions and healthy skin, GT segmentations of seven body parts (head, torso, hips, legs, feet, arms and hands), and GT binary masks of non-skin regions in the texture maps of 215 scans from the 3DBodyTex.v1 dataset [2], [3] created using the framework described in [1]. The dataset is primarily intended to enable the development of skin lesion analysis methods. Synthetic image creation consisted of two main steps. First, skin lesions from the Fitzpatrick 17k dataset were blended onto skin regions of high-resolution three-dimensional human scans from the 3DBodyTex dataset [2], [3]. Second, two-dimensional renders of the modified scans were generated.
The dataset is recorded with an on-vehicle ZED stereo camera in both urban and rural environments
Open Images is a computer vision dataset covering ~9 million images with labels spanning thousands of object categories. A subset of 1.9M includes diverse annotations types.
3 PAPERS • NO BENCHMARKS YET
CheXlocalize is a radiologist-annotated segmentation dataset on chest X-rays. The dataset consists of two types of radiologist annotations for the localization of 10 pathologies: pixel-level segmentations and most-representative points. Annotations were drawn on images from the CheXpert validation and test sets. The dataset also consists of two separate sets of radiologist annotations: (1) ground-truth pixel-level segmentations on the validation and test sets, drawn by two board-certified radiologists, and (2) benchmark pixel-level segmentations and most-representative points on the test set, drawn by a separate group of three board-certified radiologists.
HuTics contains 2040 images showing how humans use deictic gestures to interact with various daily-life objects. The images are annotated by segmentation masks of the object(s) of interest. The original purpose of the data collection is for gesture-aware object-agnostic segmentation tasks.
Fetoscopic Placental Vessel Segmentation and Registration (FetReg2021) challenge was organized as part of the MICCAI2021 Endoscopic Vision (EndoVis) challenge. Through FetReg2021 challenge, we released the first large-scale multi-centre dataset of fetoscopy laser photocoagulation procedure. The dataset contains 2,718 pixel-wise annotated images (for background, vessel, fetus, tool classes) from 24 different in vivo TTTS fetoscopic surgeries and 24 unannotated video clips video clips containing 9,616 frames for training and testing. The dataset is useful for the development of generalized and robust semantic segmentation and video mosaicking algorithms for long duration fetoscopy videos.
Unsupervised Domain Adaptation demonstrates great potential to mitigate domain shifts by transferring models from labeled source domains to unlabeled target domains. While Unsupervised Domain Adaptation has been applied to a wide variety of complex vision tasks, only few works focus on lane detection for autonomous driving. This can be attributed to the lack of publicly available datasets. To facilitate research in these directions, we propose CARLANE, a 3-way sim-to-real domain adaptation benchmark for 2D lane detection. CARLANE encompasses the single-target datasets MoLane and TuLane and the multi-target dataset MuLane. These datasets are built from three different domains, which cover diverse scenes and contain a total of 163K unique images, 118K of which are annotated. In addition we evaluate and report systematic baselines, including our own method, which builds upon Prototypical Cross-domain Self-supervised Learning. We find that false positive and false negative rates of the eva
3 PAPERS • 3 BENCHMARKS
This dataset is an extremely challenging set of over 20,000+ original Construction vehicle images captured and crowdsourced from over 600+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
0 PAPER • NO BENCHMARKS YET
Simulated pulse Doppler radar signatures for four classes of helicopter-like targets. The classes differ in the number of rotating blades each kind of target carries, thus each class translates into a specific modulation pattern on the Doppler signature. Doppler signatures are a typical feature used to achieve radar targets discrimination. This dataset was generated using a simple open-source MATLAB simulation code, which can be easily modified to generate custom datasets with more classes and increased intra-class diversity.
Dynamic occupancy grids generated from NuScenes dataset. Dataset contains static environment and semantic labels, useful for long term prediction tasks.
This brain tumor dataset contains 3064 T1-weighted contrast-enhanced images with three kinds of brain tumor. Detailed information on the dataset can be found in the readme file.
TEM image dataset containing four nanowire morphologies of bio-derived protein nanowires and synthetic peptide nanowires.
Onchocerciasis is causing blindness in over half a million people in the world today. Drug development for the disease is crippled as there is no way of measuring effectiveness of the drug without an invasive procedure. Drug efficacy measurement through assessment of viability of onchocerca worms requires the patients to undergo nodulectomy which is invasive, expensive, time-consuming, skill-dependent, infrastructure dependent and lengthy process.
Optical images of printed circuit boards as well as detailed annotations of any text, logos, and surface-mount devices (SMDs). There are several hundred samples spanning a wide variety of manufacturing locations, sizes, node technology, applications, and more.
The dataset X of this work is an extension of the heartSeg dataset. Each sample x ∈ X is an RGB image capturing the heart region of Medaka (Oryzias latipes) hatchlings from a constant ventral view. Since the body of Medaka is see-through, noninvasive studies regarding the internal organs and the whole circulatory system are practicable. A Medaka’s heart contains three parts: the atrium, the ventricle, and the bulbus. The atrium receives deoxygenated blood from the circulatory system and delivers it to the ventricle, which forwards it into the bulbus. The bulbus is the heart’s exit chamber and provides the gill arches with a constant blood flow. The blood flow through these three chambers was captured in 63 short recordings (around 11 seconds with 24 frames per second each) in total, from which the single image samples x ∈ X are extracted. The dataset is split into training and test data following the heartSeg dataset with ntrain = 565 samples in the training set Xtrain and ntest = 165
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.
FaceOcc is a high-quality face occlusion dataset which contains all mislabeled occlusions in CelebAMask-HQ and complements some occlusions and textures from the internet. The occlusion types cover sunglasses, spectacles, hands, masks, scarfs, microphones, etc.
A large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K frames. This dataset contains various teleconferencing scenes, various actions of the participants, interference of passers-by and illumination change.
Context As mentioned in the reference paper:
Stack of 2D gray images of glass fiber-reinforced polyamide 66 (GF-PA66) 3D X-ray Computed Tomography (XCT) specimen.
By releasing this dataset, we aim at providing a new testbed for computer vision techniques using Deep Learning. The main peculiarity is the shift from the domain of "natural images" proper of common benchmark dataset to biological imaging. We anticipate that the advantages of doing so could be two-fold: i) fostering research in biomedical-related fields - for which popular pre-trained models perform typically poorly - and ii) promoting methodological research in deep learning by addressing peculiar requirements of these images. Possible applications include but are not limited to semantic segmentation, object detection and object counting. The data consist of 283 high-resolution pictures (1600x1200 pixels) of mice brain slices acquired through a fluorescence microscope. The final goal is to individuate and count neurons highlighted in the pictures by means of a marker, so to assess the result of a biological experiment. The corresponding ground-truth labels were generated through a hy
KITTI-360 is a large-scale dataset that contains rich sensory information and full annotations. It is the successor of the popular KITTI dataset, providing more comprehensive semantic/instance labels in 2D and 3D, richer 360 degree sensory information (fisheye images and pushbroom laser scans), very accurate and geo-localized vehicle and camera poses, and a series of new challenging benchmarks.
161 PAPERS • 6 BENCHMARKS
This dataset is the images of corn seeds considering the top and bottom view independently (two images for one corn seed: top and bottom). There are four classes of the corn seed (Broken-B, Discolored-D, Silkcut-S, and Pure-P) 17802 images are labeled by the experts at the AdTech Corp. and 26K images were unlabeled out of which 9k images were labeled using the Active Learning (BatchBALD)
Out-of-Context Cityscapes (OC-Cityscapes) is a new dataset build by replacing roads in the validation data of Cityscapes with various textures such as water, sand, grass, etc.
FAscicle Lower Leg Muscle Ultrasound Dataset is a dataset composed of 812 ultrasound images of lower leg muscles to analyze muscle weaknesses and prevent injuries. It combines the datasets provided by two articles, “Estimating Full Regional Skeletal Muscle Fibre Orientation from B-Mode Ultrasound Images Using Convolutional, Residual, and Deconvolutional Neural Networks” published by Ryan Cunningham et al. and “Automated Analysis of Musculoskeletal Ultrasound Images Using Deep Learning” published by Neil Cronin, with complementary annotations. The dataset has been introduced in this paper: Michard, H., Luvison, B., Pham, Q. C., Morales-Artacho, A. J., & Guilhem, G. (2021, August). AW-Net: automatic muscle structure analysis on B-mode ultrasound images for injury prevention. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 1-9).
This dataset were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. And acquired on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR). Images contains bean, with various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) The ground truth is defined for each images with polygons around leafs boundaries: In addition, each polygons are labeled into crop or weed. (2020-06-11)
This dataset inclue multi-spectral acquisition of vegetation for the conception of new DeepIndices. The images were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. The dataset were acquired on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR) and in Dijon (Burgundy, France, at 47°18'32.5"N 5°04'01.8"E) within the site of AgroSup Dijon. Images of bean and corn, containing various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) were acquired in top-down view at 1.8 meter from the ground. (2020-05-01)
RaidaR is a rich annotated image dataset of rainy street scenes. RaidaR consists of 58,542 real rainy images containing several rain-induced artifacts: fog, droplets, road reflections, etc. 5,000/3,658 images were carefully semantic/instance segmentated, respectively.
Test dataset for Semantic Segmentation. The datasets includes 500 RGB - images with the relative single-channel binary masks. Images are taken from the vineyards in Grugliasco - Turin - Piedmont Region -Italy
Satellite imagery analytics have numerous human development and disaster response applications, particularly when time series methods are involved. For example, quantifying population statistics is fundamental to 67 of the 232 United Nations Sustainable Development Goals, but the World Bank estimates that more than 100 countries currently lack effective Civil Registration systems. The SpaceNet 7 Multi-Temporal Urban Development Challenge aims to help address this deficit and develop novel computer vision methods for non-video time series data. In this challenge, participants will identify and track buildings in satellite imagery time series collected over rapidly urbanizing areas. The competition centers around a new open source dataset of Planet satellite imagery mosaics, which includes 24 images (one per month) covering ~100 unique geographies. The dataset will comprise over 40,000 square kilometers of imagery and exhaustive polygon labels of building footprints in the imagery, total
9 PAPERS • NO BENCHMARKS YET
FES is an indoor dataset that can be used for evaluation of deep learning approaches. It consists of 301 top-view fisheye images from an indoor scene. Annotations include bounding boxes and instance segmentation masks for 6 classes.
4 PAPERS • NO BENCHMARKS YET
The dataset of Thermal Bridges on Building Rooftops (TBBR dataset) consists of annotated combined RGB and thermal drone images with a height map. All images were converted to a uniform format of 3000$\times$4000 pixels, aligned, and cropped to 2400$\times$3400 to remove empty borders.
2 PAPERS • 2 BENCHMARKS
This basketball dataset was acquired under the Walloon region project DeepSport, using the Keemotion system installed in multiple arenas. We would like to thanks both Keemotion for letting us use their system for raw image acquisition during live productions, and the LNB for the rights on their images.
5 PAPERS • NO BENCHMARKS YET
We create Rwanda built-up regions dataset, a different and versatile in nature from previously available datasets. The varying structure size and formation, irregular patterns of construction, buildings in forests and deserts, and the existence of mud houses make it very challenging. A total of 787 satellite images of size 256 × 256 are collected at a high resolution (HR) of 1.193 meters per pixel and hand tagged for built-up region segmentation using an online tool Label-Box.
The BCSS dataset contains over 20,000 segmentation annotations of tissue regions from breast cancer images from The Cancer Genome Atlas (TCGA). This large-scale dataset was annotated through the collaborative effort of pathologists, pathology residents, and medical students using the Digital Slide Archive. It enables the generation of highly accurate machine-learning models for tissue segmentation.
The temporal variability in calving front positions of marine-terminating glaciers permits inference on the frontal ablation. Frontal ablation, the sum of the calving rate and the melt rate at the terminus, significantly contributes to the mass balance of glaciers. Therefore, the glacier area has been declared as an Essential Climate Variable product by the World Meteorological Organization. The presented dataset provides the necessary information for training deep learning techniques to automate the process of calving front delineation. The dataset includes Synthetic Aperture Radar (SAR) images of seven glaciers distributed around the globe. Five of them are located in Antarctica: Crane, Dinsmoore-Bombardier-Edgeworth, Mapple, Jorum and the Sjörgen-Inlet Glacier. The remaining glaciers are the Jakobshavn Isbrae Glacier in Greenland and the Columbia Glacier in Alaska. Several images were taken for each glacier, forming a time series. The time series lie in the time span between 1995 an
CaDIS: a Cataract Dataset for Image Segmentation is a dataset for semantic segmentation created by Digital Surgery Ltd. on top of the CATARACTS dataset. CaDIS consists of 4670 images sampled from the 25 videos on CATARACTS' training set. Each pixel in each image is labeled with its respective instrument or anatomical class from a set of 36 identified classes. More details about the dataset could be found in the paper (https://arxiv.org/pdf/1906.11586.pdf).
7 PAPERS • 3 BENCHMARKS