Based on simulated challenging conditions that correspond to adversaries that can occur in real-world environments and systems.
3 PAPERS • NO BENCHMARKS YET
FOD in Airports (FOD-A) is an image dataset of FOD, Foreign Object Degris, which consists of 31 object categories and over 30,000 annotation instances. The object categories have been selected based on guidance from prior documentation and related research by the Federal Aviation Administration (FAA).
We introduce an object detection dataset in challenging adverse weather conditions covering 12000 samples in real-world driving scenes and 1500 samples in controlled weather conditions within a fog chamber. The dataset includes different weather conditions like fog, snow, and rain and was acquired by over 10,000 km of driving in northern Europe. The driven route with cities along the road is shown on the right. In total, 100k Objekts were labeled with accurate 2D and 3D bounding boxes. The main contributions of this dataset are: - We provide a proving ground for a broad range of algorithms covering signal enhancement, domain adaptation, object detection, or multi-modal sensor fusion, focusing on the learning of robust redundancies between sensors, especially if they fail asymmetrically in different weather conditions. - The dataset was created with the initial intention to showcase methods, which learn of robust redundancies between the sensor and enable a raw data sensor fusion in cas
3 PAPERS • 1 BENCHMARK
A large-scale logo image database for logo detection and brand recognition from real-world product images.
The MUAD dataset (Multiple Uncertainties for Autonomous Driving), consisting of 10,413 realistic synthetic images with diverse adverse weather conditions (night, fog, rain, snow), out-of-distribution objects, and annotations for semantic segmentation, depth estimation, object, and instance detection. Predictive uncertainty estimation is essential for the safe deployment of Deep Neural Networks in real-world autonomous systems and MUAD allows to a better assess the impact of different sources of uncertainty on model performance.
OSAI introduces OpenTTGames - an open dataset aimed at evaluation of different computer vision tasks in Table Tennis: ball detection, semantic segmentation of humans, table and scoreboard and fast in-game events spotting.
A dataset of pedestrian traffic lights containing over 5000 photos taken at hundreds of intersections in Shanghai.
Real-world dataset of ~400 images of cuboid-shaped parcels with full 2D and 3D annotations in the COCO format.
Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted selectively. This is true even for the now automated estimation of human poses, as domain-specific, sufficiently large data sets required for training computational models are either not publicly available or not indexed at a fine enough granularity. With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. A total of 10,749 human figures are precisely enclos
The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.
Separated COCO is automatically generated subsets of COCO val dataset, collecting separated objects for a large variety of categories in real images in a scalable manner, where target object segmentation mask is separated into distinct regions by the occluder.
An open source Multi-View Overhead Imagery dataset with 27 unique looks from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). Each of these images cover the same 665 square km geographic extent and are annotated with 126,747 building footprint labels, enabling direct assessment of the impact of viewpoint perturbation on model performance.
A new dataset for streaming classification consisting of temporally correlated images from 51 distinct object categories and additional evaluation classes outside of the training distribution to test novelty recognition.
The TimberSeg 1.0 dataset is composed of 220 images showing wood logs in various environments and conditions in Canada. The images are densely annotated with segmentation masks for each log instance, as well as the corresponding bounding box and class label. This dataset aim towards enabling autonomous forestry forwarders, therefore it contains nearly 2500 instances of wood logs from an operators' point-of-view. Images were taken in the forest, near the roadside, in lumberyards and above timber-filled trailers. The logs were annotated considering a grasping perspective, meaning that only the logs above the piles and accessible are segmented.
Bangladeshi Sign Language Image Dataset (BdSLImset) is a dataset that contains images of different Bangladeshi sign letters.
2 PAPERS • NO BENCHMARKS YET
DeepPCB
2 PAPERS • 1 BENCHMARK
During the MILAN research project (MachIne Learning for AstroNomy), we have compiled a large collection of deep sky images during Electronically Assisted Astronomy sessions in Luxembourg, France, Belgium.
A challenge that consists of three tasks, each targeting a different requirement for in-clinic use. The first task involves classifying images from the GI tract into 23 distinct classes. The second task focuses on efficiant classification measured by the amount of time spent processing each image. The last task relates to automatcially segmenting polyps.
A 360-degree fisheye-like version of the popular FDDB face detection dataset.
This dataset is made up of forward-looking sonar images containing ten classes of underwater debris. The dataset can be used for segmentation or object detection. Applications include training computer vision models for underwater robotics applications.
This is a gun detection dataset with 51K annotated gun images for gun detection and other 51K cropped gun chip images for gun classification collected from a few different sources.
The Human-Parts dataset is a dataset for human body, face and hand detection with ~15k images. It contains ~106k different annotations, with multiple annotations per image.
The INRIA-Horse dataset consists of 170 horse images and 170 images without horses. All horses in all images are annotated with a bounding-box. The main challenges it offers are clutter, intra-class shape variability, and scale changes. The horses are mostly unoccluded, taken from approximately the side viewpoint, and face the same direction.
Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.
LiDAR-CS is a dataset for 3D object detection in real traffic. It contains 84,000 point cloud frames under 6 groups of different sensors but with same corresponding scenarios, captured from hybrid realistic LivDAR simulator.
5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images
2 PAPERS • 2 BENCHMARKS
Occluded COCO is automatically generated subset of COCO val dataset, collecting partially occluded objects for a large variety of categories in real images in a scalable manner, where target object is partially occluded but the segmentation mask is connected.
A realistic, diverse, and challenging dataset for object detection on images. The data was recorded at a beer tent in Germany and consists of 15 different categories of food and drink items.
Parasitic infections have been recognized as one of the most significant causes of illnesses by WHO. Most infected persons shed cysts or eggs in their living environment, and unwittingly cause transmission of parasites to other individuals. Diagnosis of intestinal parasites is usually based on direct examination in the laboratory, of which capacity is obviously limited. Targeting to automate routine fecal examination for parasitic diseases, this challenge aims to gather experts in the field to develop robust automated methods to detect and classify eggs of parasitic worms in a variety of microscopic images. Participants will work with a large-scale dataset, containing 11 types of parasitic eggs from fecal smear samples. They are the main interest because of causing major diseases and illness in developing countries. We open to any techniques used for parasitic egg recognition, ranging from conventional approaches based on statistical models to deep learning techniques. Finally, the org
Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.
S2TLD is a traffic light dataset, which contains 5,786 images of approximately 1,080 * 1,920 pixels and 720 * 1,280 pixels. It also contains 5 categories (include red, yellow, green, off and wait on) of 1,4130 instances. The scenes cover a decent variety of road scenes and typical: * Busy street scenes inner-city, * Dense stop-and-go traffic * Strong changes in illumination/exposure * Flickering/Fluctuating traffic lights * Multiple visible traffic lights * Image parts that can be confused with traffic lights (e.g. large round tail lights)
A salient object subitizing image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace.
SmartCity consists of 50 images in total collected from ten city scenes including office entrance, sidewalk, atrium, shopping mall etc.. Unlike the existing crowd counting datasets with images of hundreds/thousands of pedestrians and nearly all the images being taken outdoors, SmartCity has few pedestrians in images and consists of both outdoor and indoor scenes: the average number of pedestrians is only 7.4 with minimum being 1 and maximum being 14.
The dataset of Thermal Bridges on Building Rooftops (TBBR dataset) consists of annotated combined RGB and thermal drone images with a height map. All images were converted to a uniform format of 3000$\times$4000 pixels, aligned, and cropped to 2400$\times$3400 to remove empty borders.
We present TNCR, a new table dataset with varying image quality collected from free open source websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.
ZeroWaste is a dataset for automatic waste detection and segmentation. This dataset contains over 1,800 fully segmented video frames collected from a real waste sorting plant along with waste material labels for training and evaluation of the segmentation methods, as well as over 6,000 unlabeled frames that can be further used for semi-supervised and self-supervised learning techniques. ZeroWaste also provides frames of the conveyor belt before and after the sorting process, comprising a novel setup that can be used for weakly-supervised segmentation.
Unsustainable fishing practices worldwide pose a major threat to marine resources and ecosystems. Identifying vessels that do not show up in conventional monitoring systems---known as ``dark vessels''---is key to managing and securing the health of marine environments. With the rise of satellite-based synthetic aperture radar (SAR) imaging and modern machine learning (ML), it is now possible to automate detection of dark vessels day or night, under all-weather conditions. SAR images, however, require a domain-specific treatment and are not widely accessible to the ML community. Maritime objects (vessels and offshore infrastructure) are relatively small and sparse, challenging traditional computer vision approaches. We present the largest labeled dataset for training ML models to detect and characterize vessels and ocean structures in SAR imagery. xView3-SAR consists of nearly 1,000 analysis-ready SAR images from the Sentinel-1 mission that are, on average, 29,400-by-24,400 pixels each.
360-SOD contains 500 high-resolution equirectangular images.
1 PAPER • NO BENCHMARKS YET
The dataset contains aerial agricultural images of a potato field with manual labels of healthy and stressed plant regions. The images were collected with a Parrot Sequoia multispectral camera carried by a 3DR Solo drone flying at an altitude of 3 meters. The dataset consists of RGB images with a resolution of 750×750 pixels, and spectral monochrome red, green, red-edge, and near-infrared images with a resolution of 416×416 pixels, and XML files with annotated bounding boxes of healthy and stressed potato crop.
1 PAPER • 1 BENCHMARK
This is a public dataset for evaluating multi-object detection algorithms in active Terahertz imaging resolution 5 mm by 5 mm.
The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.
This dataset contains 369 images of Trash used for deep learning. Each image is manually labelled by our team for accurate detections making a total of 470 bounding boxes. There are total 4 classes {(0: glass), (1:paper), (2:metal), (3:plastic)}
P. vivax (malaria) infected human blood smears with bounding box annotations. The data consists of two classes of uninfected cells (RBCs and leukocytes) and four classes of infected cells (gametocytes, rings, trophozoites, and schizonts).
CLAD (Compled and Long Activities Dataset) is an activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset consists of a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations.
The training and validation data are subsets of the training split of the MS COCO dataset (2017 release, bounding boxes only). The test set is taken from the validation split of the MS COCO dataset.
CPPE - 5 (Medical Personal Protective Equipment) is a new challenging dataset with the goal to allow the study of subordinate categorization of medical personal protective equipments, which is not possible with other popular data sets that focus on broad level categories.
Cattle data set, which was introduced in a paper. We (not the authors) created a train-val-test split.