An object-centric version of Stylized COCO to benchmark texture bias and out-of-distribution robustness of vision models. See the ECCV 22 paper and supplementary material for details.
1 PAPER • NO BENCHMARKS YET
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
InsPLAD is a Dataset for Power Line Asset Inspection containing 10,607 high-resolution Unmanned Aerial Vehicles colour images. It contains 17 unique power line assets captured from real-world operating power lines. Some of those assets (five, to be precise) are also annotated regarding their conditions. They present the following defects: corrosion (4 of them), broken/missing cap (1 of them), and bird's nest presence (1 of them).
4 PAPERS • 1 BENCHMARK
TAMPAR is a real-world dataset of parcel photos for tampering detection with annotations in COCO format. For details see the paper and for visual samples the project page. Features are:
FractureAtlas is a musculoskeletal bone fracture dataset with annotations for deep learning tasks like classification, localization, and segmentation. The dataset contains a total of 4,083 X-Ray images with annotation in COCO, VGG, YOLO, and Pascal VOC format. This dataset is made freely available for any purpose. The data provided within this work are free to copy, share or redistribute in any medium or format. The data might be adapted, remixed, transformed, and built upon. The dataset is licensed under a CC-BY 4.0 license. It should be noted that to use the dataset correctly, one needs to have knowledge of medical and radiology fields to understand the results and make conclusions based on the dataset. It's also important to consider the possibility of labeling errors.
COCO-O(ut-of-distribution) contains 6 domains (sketch, cartoon, painting, weather, handmake, tattoo) of COCO objects which are hard to be detected by most existing detectors. The dataset has a total of 6,782 images and 26,624 labelled bounding boxes.
41 PAPERS • 1 BENCHMARK
A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset
8 PAPERS • 2 BENCHMARKS
Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD – an assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view and multi-modality videos, 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance and the further reasoning steps for comprehending knowledge in assembly progress, process effici
This is the first general Underwater Image Instance Segmentation (UIIS) dataset containing 4,628 images for 7 categories with pixel-level annotations for underwater instance segmentation task
1 PAPER • 1 BENCHMARK
The PhenoBench dataset contains multiple image segmentation challenges from the agricultural domain.
5 PAPERS • NO BENCHMARKS YET
Plant factories are an advanced form of facility agriculture that enable efficient plant cultivation through controllable environmental conditions, making them highly suitable for the automation and intelligent application of machinery. Tomato cultivation in plant factories has significant economic and agricultural value and can be utilized for various applications such as seedling cultivation, breeding, and genetic engineering. However, manual completion is still required for operations such as detection, counting, and classification of tomato fruits, and the application of machine detection is currently inefficient. Furthermore, research on the automation of tomato harvesting in plant factory environments is limited due to the lack of a suitable dataset. To address this issue, a tomato fruit dataset was constructed for plant factory environments, named as TomatoPlantfactoryDataset, which can be quickly applied to multiple tasks, including the detection of control systems, harvesting
0 PAPER • NO BENCHMARKS YET
The Remote Sensing dataset contains the following key features for each annotated marking:
The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.
4 PAPERS • NO BENCHMARKS YET
The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.
Satlas is a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and 7 label modalities.
7 PAPERS • NO BENCHMARKS YET
The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.
Open Images is a computer vision dataset covering ~9 million images with labels spanning thousands of object categories. A subset of 1.9M includes diverse annotations types.
Occluded COCO is automatically generated subset of COCO val dataset, collecting partially occluded objects for a large variety of categories in real images in a scalable manner, where target object is partially occluded but the segmentation mask is connected.
2 PAPERS • 1 BENCHMARK
Separated COCO is automatically generated subsets of COCO val dataset, collecting separated objects for a large variety of categories in real images in a scalable manner, where target object segmentation mask is separated into distinct regions by the occluder.
3 PAPERS • 1 BENCHMARK
The study showed that the apple scab can be detected in the high-resolution RGB images in an early stage of its development. If two datasets, the early and advanced stages, are grouped together, the scab in the early stage is not visible after image resizing for CNN inputs 200-500px.
Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted selectively. This is true even for the now automated estimation of human poses, as domain-specific, sufficiently large data sets required for training computational models are either not publicly available or not indexed at a fine enough granularity. With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. A total of 10,749 human figures are precisely enclos
DeepPCB
The ELEVATER benchmark is a collection of resources for training, evaluating, and analyzing language-image models on image classification and object detection. ELEVATER consists of:
23 PAPERS • 2 BENCHMARKS
Cattle data set, which was introduced in a paper. We (not the authors) created a train-val-test split.
SeaDronesSee is a large-scale data set aimed at helping develop systems for Search and Rescue (SAR) using Unmanned Aerial Vehicles (UAVs) in maritime scenarios. Building highly complex autonomous UAV systems that aid in SAR missions requires robust computer vision algorithms to detect and track objects or persons of interest. This data set provides three sets of tracks: object detection, single-object tracking and multi-object tracking. Each track consists of its own data set and leaderboard.
16 PAPERS • 3 BENCHMARKS
Object Detection data set created from the engine DeepGTAV, which is based on the video game GTAV. Part of the three data sets proposed in the paper. This data set is motivated from the Cattle dataset with almost the same classes.
Object Detection data set created from the engine DeepGTAV, which is based on the video game GTAV. Part of the three data sets proposed in the paper. This data set is motivated from the SeaDronesSee dataset with almost the same classes.
Object Detection data set created from the engine DeepGTAV, which is based on the video game GTAV. Part of the three data sets proposed in the paper. This data set is motivated from the VisDrone data set with almost the same classes.
2 PAPERS • NO BENCHMARKS YET
CPPE - 5 (Medical Personal Protective Equipment) is a new challenging dataset with the goal to allow the study of subordinate categorization of medical personal protective equipments, which is not possible with other popular data sets that focus on broad level categories.
Minor Irrigation Structures Check-Dam Dataset is a public dataset annotated by domain experts using images from Google static map for instance segmentation and object detection tasks.
Hands Guns and Phones (HGP) dataset contains 2199 images (1989 for training an 210 for testing) of people using guns or phones in real-world scenarios (people making phones reviews, shooting drills, or making calls). Every image of this dataset is labeled with the bounding boxes of Hands, Phones and Guns. All the aforementioned images were collected from Youtube videos and have different sizes.
This is a detailed description of the dataset, a data sheet for the dataset as proposed by Gebru et al.
The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.
3 PAPERS • NO BENCHMARKS YET
Visible-infrared Paired Dataset for Low-light Vision 30976 images (15488 pairs) 24 dark scenes, 2 daytime scenes Support for image-to-image translation (visible to infrared, or infrared to visible), visible and infrared image fusion, low-light pedestrian detection, and infrared pedestrian detection (The original image and video pairs (before registration) of LLVIP are also released!)
53 PAPERS • 6 BENCHMARKS
5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images
2 PAPERS • 2 BENCHMARKS
STN PLAD is a high-resolution and real-world image dataset of multiple high-voltage power line components. It has 2,409 annotated objects divided into five classes: transmission tower, insulator, spacer, tower plate, and Stockbridge damper, which vary in size (resolution), orientation, illumination, angulation, and background.
data/images:
We present TNCR, a new table dataset with varying image quality collected from free open source websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.
The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.
The UAVVaste dataset consists to date of 772 images and 3716 annotations. The main motivation for creation of the dataset was the lack of domain-specific data. The datasets that are widely used for object detection evaluation benchmarking. The dataset is made publicly available and is intended to be expanded.
The dataset of Thermal Bridges on Building Rooftops (TBBR dataset) consists of annotated combined RGB and thermal drone images with a height map. All images were converted to a uniform format of 3000$\times$4000 pixels, aligned, and cropped to 2400$\times$3400 to remove empty borders.
The dataset is split between train, test and val folders.
6 PAPERS • NO BENCHMARKS YET
This dataset contains 369 images of Trash used for deep learning. Each image is manually labelled by our team for accurate detections making a total of 470 bounding boxes. There are total 4 classes {(0: glass), (1:paper), (2:metal), (3:plastic)}
OSAI introduces OpenTTGames - an open dataset aimed at evaluation of different computer vision tasks in Table Tennis: ball detection, semantic segmentation of humans, table and scoreboard and fast in-game events spotting.
The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.
We introduce an object detection dataset in challenging adverse weather conditions covering 12000 samples in real-world driving scenes and 1500 samples in controlled weather conditions within a fog chamber. The dataset includes different weather conditions like fog, snow, and rain and was acquired by over 10,000 km of driving in northern Europe. The driven route with cities along the road is shown on the right. In total, 100k Objekts were labeled with accurate 2D and 3D bounding boxes. The main contributions of this dataset are: - We provide a proving ground for a broad range of algorithms covering signal enhancement, domain adaptation, object detection, or multi-modal sensor fusion, focusing on the learning of robust redundancies between sensors, especially if they fail asymmetrically in different weather conditions. - The dataset was created with the initial intention to showcase methods, which learn of robust redundancies between the sensor and enable a raw data sensor fusion in cas
15 PAPERS • 2 BENCHMARKS
OpenImages V6 is a large-scale dataset , consists of 9 million training images, 41,620 validation samples, and 125,456 test samples. It is a partially annotated dataset, with 9,600 trainable classes
18 PAPERS • 3 BENCHMARKS