Multispectral and HD vineyard orthomosaics from central Portugal
1 PAPER • NO BENCHMARKS YET
OmniCity is a dataset for omnipotent city understanding from multi-level and multi-view images. It contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City. This dataset introduces a new task of fine-grained building instance segmentation on street-level panorama images. It also provides new problem settings for existing tasks, such as cross-view image matching, synthesis, segmentation, detection, etc., and facilitates the developing of new methods for large-scale city understanding, reconstruction, and simulation.
The Person In Context (PIC) dataset is a dataset for human-centric relation segmentation (HRS), which contains 17,122 high-resolution images and densely annotated entity segmentation and relations, including 141 object categories, 23 relation categories and 25 semantic human parts.
Panoramic Video Panoptic Segmentation Dataset is a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. The dataset has labels for 28 semantic categories and 2,860 temporal sequences that were captured by five cameras mounted on autonomous vehicles driving in three different geographical locations, leading to a total of 100k labeled camera images.
This dataset was built with data acquired at the Hospital Clinic of Barcelona, Spain. It is composed of a total of 1126 HD polyp images. There are a total of 473 unique polyps, with a variable number of different shots per polyp (minimum: 2, maximum: 24, median: 10). Special attention was paid to ensure that images from the same polyp show different conditions. An external frame-grabber and a white light endoscope were used to capture raw images. The dataset contains images with two different resolutions: 1920 x 1080 and 1350 x 1080.
A Video Dataset for Visual Perception and Autonomous Navigation in Unstructured Environments. Website: http://rugd.vision/
1 PAPER • 1 BENCHMARK
Risk-Aware Planning is a dataset that contains the overhead images and their semantic segmentation captured by a drone from the CityEnviron environment in AirSim simulator.
The SBCoseg dataset includes 889 groups of images and each group consists of 18 images with a common object, leading to 16002 images in total. The whole dataset is divided into five subsets: with ECFB, with TR, with MH, with SD, and Normal (normal data). The five subsets contain 193, 251, 82, 83, and 280 image groups, respectively. Each original image is in JPG format with a pixel size of 360 ×360, and each ground-truth image is in PNG format.
SYNTHIA-PANO is the panoramic version of SYNTHIA dataset. Five sequences are included: Seqs02-summer, Seqs02-fall, Seqs04-summer, Seqs04-fall and Seqs05-summer. Panomaramic images with fine annotation for semantic segmentation.
Test dataset for Semantic Segmentation. The datasets includes 500 RGB - images with the relative single-channel binary masks. Images are taken from the vineyards in Grugliasco - Turin - Piedmont Region -Italy
SemanticUSL is a dataset for domain adaptation for LiDAR point cloud semantic segmentation. The dataset has the same data format and ontology as SemanticKITTI.
A Sentinel-2 based time series multi country benchmark dataset, tailored for agricultural monitoring applications with Machine and Deep Learning. Sen4AgriNet dataset is annotated from farmer declarations collected via the Land Parcel Identification System (LPIS) for harmonizing country wide labels. Sen4AgriNet is the only multi-country, multi-year dataset that includes all spectral information. It is constructed to cover the period 2016-2020 for Catalonia and France, while it can be extended to include additional countries. Currently, it contains 42.5 million parcels, which makes it significantly larger than other available archives.
TAS-NIR is a VIS+NIR dataset of semantically annotated images in unstructured outdoor environments. It consists of 209 VIS+NIR image pairs with a fine-grained semantic segmentation.
High-resolution thermal infrared face database with extensive manual annotations, introduced by Kopaczka et al, 2018. Useful for training algoeithms for image processing tasks as well as facial expression recognition. The full database itself, all annotations and the complete source code are freely available from the authors for research purposes at https://github.com/marcinkopaczka/thermalfaceproject.
The dataset contains procedurally generated images of transparent vessels containing liquid and objects . The data for each image includes segmentation maps, 3d depth maps, and normal maps of of the liquid or object inside the transparent vessel, and the vessel. In addition, the properties of the materials inside the containers are given(color/transparency/roughness/metalness). In addition, a natural image benchmark for the 3d/depth estimation of objects inside transparent containers is supplied. 3d models of the objects (GTLF) are also supplied.
This is the first general Underwater Image Instance Segmentation (UIIS) dataset containing 4,628 images for 7 categories with pixel-level annotations for underwater instance segmentation task
The semantic segmentation of clothes is a challenging task due to the wide variety of clothing styles, layers and shapes. The UTFPR-SBD3 contains 4,500 images manually annotated at pixel level in 18 classes plus background. To ensure the high quality of the dataset, all images were manually annotated at the pixel level using JS Segment Annotator, 2 a free web-based image annotation tool. The raw images were carefully selected to avoid, as far as possible, classes with low number of instances.
VizWiz-FewShot is a a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.
dacl10k stands for damage classification 10k images and is a multi-label semantic segmentation dataset for 19 classes (13 damages and 6 objects) present on bridges.
This dataset is the images of corn seeds considering the top and bottom view independently (two images for one corn seed: top and bottom). There are four classes of the corn seed (Broken-B, Discolored-D, Silkcut-S, and Pure-P) 17802 images are labeled by the experts at the AdTech Corp. and 26K images were unlabeled out of which 9k images were labeled using the Active Learning (BatchBALD)
0 PAPER • NO BENCHMARKS YET
The Dense Material Segmentation Dataset (DMS) consists of 3 million polygon labels of material categories (metal, wood, glass, etc) for 44 thousand RGB images. The dataset is described in the research paper, A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing.
Lemon dataset has been prepared to investigate the possibilities to tackle the issue of fruit quality control. It contains 2690 annotated images (1056 x 1056 pixels). Raw lemon images have been captured using the procedure described in the following blogpost and manually annotated using CVAT.
Multi-grained Vehicle Parsing (MVP) is a large-scale dataset for semantic analysis of vehicles in the wild, which has several featured properties. 1. The MVP contains 24,000 vehicle images captured in read-world surveillance scenes, which makes it more scalable for real applications. 2. For different requirements, we annotate the vehicle images with pixel-level part masks in two granularities, i.e., the coarse annotations of ten classes and the fine annotations of 59 classes. The former can be applied to object-level applications such as vehicle Re-Id, fine-grained classification, and pose estimation, while the latter can be explored for high-quality image generation and content manipulation. 3. The images reflect the complexity of real surveillance scenes, such as different viewpoints, illumination conditions, backgrounds, and etc. In addition, the vehicles have diverse countries, types, brands, models, and colors, which makes the dataset more diverse and challenging.
The thickness and appearance of retinal layers are essential markers for diagnosing and studying eye diseases. Despite the increasing availability of imaging devices to scan and store large amounts of data, analyzing retinal images and generating trial endpoints has remained a manual, error-prone, and time-consuming task. In particular, the lack of large amounts of high-quality labels for different diseases hinders the development of automated algorithms. Therefore, we have compiled 5016 pixel-wise manual labels for 1672 optical coherence tomography (OCT) scans featuring two different diseases as well as healthy subjects to help democratize the process of developing novel automatic techniques. We also collected 4698 bounding box annotations for a subset of 566 scans across 9 classes of disease biomarker. Due to variations in retinal morphology, intensity range, and changes in contrast and brightness, designing segmentation and detection methods that can generalize to different disease
OpenSurfaces is a large database of annotated surfaces created from real-world consumer photographs. The framework used for the annotation process draws on crowdsourcing to segment surfaces from photos, and then annotate them with rich surface properties, including material, texture and contextual information.
UAS-based Multispectral othomosaics of vineyards from central Portugal