So2Sat LCZ42 consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months.
11 PAPERS • 1 BENCHMARK
An open source Multi-View Overhead Imagery dataset with 27 unique looks from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). Each of these images cover the same 665 square km geographic extent and are annotated with 126,747 building footprint labels, enabling direct assessment of the impact of viewpoint perturbation on model performance.
3 PAPERS • NO BENCHMARKS YET
Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs (a) created by professional designers with a variety of ground truth 3D structure annotations (b) and generate photo-realistic 2D images (c). The dataset consists of rendering images and corresponding ground truth annotations (e.g., semantic, albedo, depth, surface normal, layout) under different lighting and furniture configurations.
65 PAPERS • 1 BENCHMARK
Swiss3DCities is a dataset that is manually annotated for semantic segmentation with per-point labels, and is built using photogrammetry from images acquired by multirotors equipped with high-resolution cameras.
4 PAPERS • NO BENCHMARKS YET
Synscapes is a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis.
42 PAPERS • 1 BENCHMARK
SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is assigned a label from one of nine categories.
8 PAPERS • NO BENCHMARKS YET
TACO is a growing image dataset of waste in the wild. It contains images of litter taken under diverse environments: woods, roads and beaches. These images are manually labelled and segmented according to a hierarchical taxonomy to train and evaluate object detection algorithms. The annotations are provided in COCO format.
10 PAPERS • NO BENCHMARKS YET
TTPLA is a public dataset which is a collection of aerial images on Transmission Towers (TTs) and Power Lines (PLs). It can be used for detection and segmentation of transmission towers and power lines. It consists of 1,100 images with the resolution of 3,840×2,160 pixels, as well as manually labelled 8,987 instances of TTs and PLs.
7 PAPERS • NO BENCHMARKS YET
Photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world.
1 PAPER • NO BENCHMARKS YET
TOP is a synthetic dataset for topology optimization generated using Topy. The generated dataset has 10,000 objects which consist on 100 iterations of the optimization process for the problem defined on a regular 40 x 40 grid.
2 PAPERS • NO BENCHMARKS YET
Toronto-3D is a large-scale urban outdoor point cloud dataset acquired by an MLS system in Toronto, Canada for semantic segmentation. This dataset covers approximately 1 km of road and consists of about 78.3 million points. Point clouds has 10 attributes and classified in 8 labelled object classes.
21 PAPERS • 1 BENCHMARK
A large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10,428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.
29 PAPERS • 1 BENCHMARK
UAS-based Multispectral othomosaics of vineyards from central Portugal
0 PAPER • NO BENCHMARKS YET
UAVid is a high-resolution UAV semantic segmentation dataset as a complement, which brings new challenges, including large scale variation, moving object recognition and temporal consistency preservation. The UAV dataset consists of 30 video sequences capturing 4K high-resolution images in slanted views. In total, 300 images have been densely labeled with 8 classes for the semantic labeling task.
34 PAPERS • 2 BENCHMARKS
A large collection of interaction-rich video data which are annotated and analyzed.
The Vistas-NP dataset is an out-of-distribution detection dataset based on the Mapillary Vistas dataset. The original Vistas dataset consists of 18,000 training images and 2,000 validation images with 66 classes. In Vistas-NP the human classes are used as outliers due to their dispersion across scenes and visual diversity from other objects. The dataset is created by excluding all images with class person and the three rider classes to the test subset. Consequently, the dataset has 8,003 train images and 830 validation images. The test set contains 11,167.
The Vocal Folds dataset is a dataset for automatic segmentation of laryngeal endoscopic images. The dataset consists of 8 sequences from 2 patients containing 536 hand segmented in vivo colour images of the larynx during two different resection interventions with a resolution of 512x512 pixels.
Synthetic training dataset of 50,000 depth images and 320,000 object masks using simulated heaps of 3D CAD models.
4 PAPERS • 1 BENCHMARK
WildDash is a benchmark evaluation method is presented that uses the meta-information to calculate the robustness of a given algorithm with respect to the individual hazards.
46 PAPERS • 2 BENCHMARKS
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of its prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. WoodScape is an extensive fisheye automotive dataset named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images.
49 PAPERS • 1 BENCHMARK
A balanced dataset of color names and RGB values for training classifiers.
iSAID contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The images of iSAID is the same as the DOTA-v1.0 dataset, which are manily collected from the Google Earth, some are taken by satellite JL-1, the others are taken by satellite GF-2 of the China Centre for Resources Satellite Data and Application.
59 PAPERS • 3 BENCHMARKS
Created from endoscopic video feeds of real-world surgical procedures. Overall, the data consists of 307 images, each of which is annotated for the organs and different surgical instruments present in the scene.