DELIVER is an arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, the dataset is also used in four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. It is designed for the tasks of arbitrary-modal semantic segmentation.
7 PAPERS • 2 BENCHMARKS
X-ray images in this data set have been acquired from the tuberculosis control program of the Department of Health andHuman Services of Montgomery County, MD, USA. This set contains 138 posterior-anterior x-rays, of which 80 x-rays are normal and 58 x-rays areabnormal with manifestations of tuberculosis. All images are de-identified and available in DICOM format. The set covers a wide range of abnormalities,including effusions and miliary patterns. The data set includes radiology readings available as a text files and summary of its content
7 PAPERS • 1 BENCHMARK
The increasing incidence of melanoma has recently promoted the development of computer-aided diagnosis systems for the classification of dermoscopic images. The PH² dataset has been developed for research and benchmarking purposes, in order to facilitate comparative studies on both segmentation and classification algorithms of dermoscopic images. PH² is a dermoscopic image database acquired at the Dermatology Service of Hospital Pedro Hispano, Matosinhos, Portugal.
7 PAPERS • 3 BENCHMARKS
RailSem19 offers 8500 unique images taken from a the ego-perspective of a rail vehicle (trains and trams). Extensive semantic annotations are provided, both geometry-based (rail-relevant polygons, all rails as polylines) and dense label maps with many Cityscapes-compatible road labels. Many frames show areas of intersection between road and rail vehicles (railway crossings, trams driving on city streets). RailSem19 is usefull for rail applications and road applications alike.
7 PAPERS • NO BENCHMARKS YET
SketchyScene is a large-scale dataset of scene sketches to advance research on sketch understanding at both the object and scene level. The dataset is created through a novel and carefully designed crowdsourcing pipeline, enabling users to efficiently generate large quantities of realistic and diverse scene sketches. SketchyScene contains more than 29,000 scene-level sketches, 7,000+ pairs of scene templates and photos, and 11,000+ object sketches. All objects in the scene sketches have ground-truth semantic and instance masks. The dataset is also highly scalable and extensible, easily allowing augmenting and/or changing scene composition.
SpaceNet 2: Building Detection v2 - is a dataset for building footprint detection in geographically diverse settings from very high resolution satellite images. It contains over 302,701 building footprints, 3/8-band Worldview-3 satellite imagery at 0.3m pixel res., across 5 cities (Rio de Janeiro, Las Vegas, Paris, Shanghai, Khartoum), and covers areas that are both urban and suburban in nature. The dataset was split using 60%/20%/20% for train/test/validation.
TTPLA is a public dataset which is a collection of aerial images on Transmission Towers (TTs) and Power Lines (PLs). It can be used for detection and segmentation of transmission towers and power lines. It consists of 1,100 images with the resolution of 3,840×2,160 pixels, as well as manually labelled 8,987 instances of TTs and PLs.
The TrashCan dataset is an instance-segmentation dataset of underwater trash. It is comprised of annotated images (7,212 images) which contain observations of trash, ROVs, and a wide variety of undersea flora and fauna. The annotations in this dataset take the format of instance segmentation annotations: bitmaps containing a mask marking which pixels in the image contain each object. The imagery in TrashCan is sourced from the J-EDI (JAMSTEC E-Library of Deep-sea Images) dataset, curated by the Japan Agency of Marine Earth Science and Technology (JAMSTEC).
A high-resolution semantic segmentation dataset with 50 validation and 100 test objects. Image resolution in BIG ranges from 2048×1600 to 5000×3600. Every image in the dataset has been carefully labeled by a professional while keeping the same guidelines as PASCAL VOC 2012 without the void region.
6 PAPERS • 1 BENCHMARK
The Freiburg Forest dataset was collected using a Viona autonomous mobile robot platform equipped with cameras for capturing multi-spectral and multi-modal images. The dataset may be used for evaluation of different perception algorithms for segmentation, detection, classification, etc. All scenes were recorded at 20 Hz with a camera resolution of 1024x768 pixels. The data was collected on three different days to have enough variability in lighting conditions as shadows and sun angles play a crucial role in the quality of acquired images. The robot traversed about 4.7 km each day. The dataset creators provide manually annotated pixel-wise ground truth segmentation masks for 6 classes: Obstacle, Trail, Sky, Grass, Vegetation, and Void.
6 PAPERS • 2 BENCHMARKS
A database of images of approximately 960 unique plants belonging to 12 species at several growth stages is made publicly available. It comprises annotated RGB images with a physical resolution of roughly 10 pixels per mm.
6 PAPERS • NO BENCHMARKS YET
RoadAnomaly21 is a dataset for anomaly segmentation, the task of identify the image regions containing objects that have never been seen during training. It consists of an evaluation dataset of 100 images with pixel-level annotations. Each image contains at least one anomalous object, e.g. animals or unknown vehicles. The anomalies can appear anywhere in the image and widely differ in size, covering from 0.5% to 40% of the image
The SD-198 dataset contains 198 different diseases from different types of eczema, acne and various cancerous conditions. There are 6,584 images in total. A subset include the classes with more than 20 image samples, namely SD-128."
6 PAPERS • 6 BENCHMARKS
The SWIMSEG dataset contains 1013 images of sky/cloud patches, along with their corresponding binary segmentation maps. The ground truth annotation was done in consultation with experts from Singapore Meteorological Services. All images were captured in Singapore using WAHRSIS, a calibrated ground-based whole sky imager, over a period of 22 months from October 2013 to July 2015. Each patch covers about 60-70 degrees of the sky with a resolution of 600x600 pixels.
BRATS 2014 is a brain tumor segmentation dataset.
5 PAPERS • 1 BENCHMARK
Cata7 is the first cataract surgical instrument dataset for semantic segmentation. The dataset consists of seven videos while each video records a complete cataract surgery. All videos are from Beijing Tongren Hospital. Each video is split into a sequence of images, where resolution is 1920×1080 pixels. To reduce redundancy, the videos are downsampled from 30 fps to 1 fps. Also, images without surgical instruments are manually removed. Each image is labeled with precise edges and types of surgical instruments. This dataset contains 2,500 images, which are divided into training and test sets. The training set consists of five video sequences and test set consists of two video sequence.
5 PAPERS • NO BENCHMARKS YET
Hi4D contains 4D textured scans of 20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D contains rich interaction centric annotations in 2D and 3D alongside accurately registered parametric body models.
IDDA is a large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains. The dataset has been created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions, in seven different city types.
LabPics Chemistry Dataset
OST300 is an outdoor scene dataset with 300 test images of outdoor scenes, and a training set of 7 categories of images with rich textures.
PerSeg is a dataset for personalized segmentation. The raw images are collect from the training data of subject driven diffusion models: DreamBooth, Textual Inversion, and Custom Diffusion. PerSeg contains 40 objects of various categories in total, including daily necessities, animals, and buildings. Contextualized in different poses or scenes, each object is related with 5∼7 images with our annotated masks.
A dataset which contains over 200 apartments.
The SWINSEG dataset contains 115 nighttime images of sky/cloud patches along with their corresponding binary ground truth maps. The ground truth annotation was done in consultation with experts from Singapore Meteorological Services. All images were captured in Singapore using WAHRSIS, a calibrated ground-based whole sky imager, over a period of 12 months from January to December 2016. All image patches are 500x500 pixels in size, and were selected considering several factors such as time of the image capture, cloud coverage, and seasonal variations.
SemanticSTF is an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. It contains 2,076 scans captured by a Velodyne HDL64 S3D LiDAR sensor from STF that cover various adverse weather conditions including 694 snowy, 637 dense-foggy, 631 light-foggy, and 114 rainy (all rainy LiDAR scans in STF).
TICaM is a Time-of-flight In-car Cabin Monitoring dataset for vehicle interior monitoring using a single wide-angle depth camera. This dataset addresses the deficiencies of other available in-car cabin datasets in terms of the ambit of labeled classes, recorded scenarios and provided annotations; all at the same time. It consists of an exhaustive list of actions performed while driving and multi-modal labeled images (depth, RGB and IR), with complete annotations for 2D and 3D object detection, instance and semantic segmentation as well as activity annotations for RGB frames. Additional to real recordings, it also contains a synthetic dataset of in-car cabin images with same multi-modality of images and annotations, providing a unique and extremely beneficial combination of synthetic and real data for effectively training cabin monitoring systems and evaluating domain adaptation approaches.
UPLight is an underwater RGB-Polarization multimodal semantic segmentation dataset with 12 typical underwater semantic classes.
Embrapa Wine Grape Instance Segmentation Dataset (WGISD) contains grape clusters properly annotated in 300 images and a novel annotation methodology for segmentation of complex objects in natural images.
Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background / Segmented Background / AI generated Background/ Bias of tools during annotation/ Color in Background / Dependent Factor in Background/ LatenSpace Distance of Foreground/ Random Background with Real Environment!
Research on semantic segmentation of traffic scenes using color and polarization information (including training and testing sets).
We design an all-day semantic segmentation benchmark all-day CityScapes. It is the first semantic segmentation benchmark that contains samples from all-day scenarios, i.e., from dawn to night. Our dataset will be made publicly available at [https://isis-data.science.uva.nl/cv/1ADcityscape.zip].
4 PAPERS • 1 BENCHMARK
Video sequences from a glasshouse environment in Campus Kleinaltendorf(CKA), University of Bonn, captured by PATHoBot, a glasshouse monitoring robot.
4 PAPERS • NO BENCHMARKS YET
The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.
Cholecystectomy is a very common abdominal surgical procedure almost ubiquitously performed with a laparoscopic approach, hence guided by an endoscopic video. Deep learning models for LC video analysis have been developed with the aim of assisting surgeons during interventions, improving staff awareness and readiness, and facilitating postoperative documentation and research. . However, datasets and models for video semantic segmentation of LC are lacking. Recognizing fine-grained hepatocystic anatomy through semantic segmentation could help surgeons better assess the critical view of safety (CVS), a universally recommended technique consisting in well exposing anatomical landmarks to prevent bile duct injuries. Additionally, segmentation masks of hepatocystic structures could be leveraged by deep learning models for automatic assessment of CVS and surgical action recognition to improve their performance. We believe that generating a dataset for video semantic segmentation of hepatocy
GolfDB is a high-quality video dataset created for general recognition applications in the sport of golf, and specifically for the task of golf swing sequencing.
The Kvasir-SEG dataset includes 196 polyps smaller than 10 mm classified as Paris class 1 sessile or Paris class IIa. We have selected it with the help of expert gastroenterologists. We have released this dataset separately as a subset of Kvasir-SEG. We call this subset Kvasir-Sessile.
Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four modalities: RGB, angle of linear polarization (AoLP), degree of linear polarization (DoLP), and near-infrared (NIR). The dataset provides annotated ground truth labels for both material and semantic segmentation for every pixel. The dataset is divided training set with 302 image sets, validation set with 96 image sets, and test set with 102 image sets. Each image has 1224 x 1024 pixels and a total of 20 class labels per pixel.
MatterportLayout extends the Matterport3D dataset with general Manhattan layout annotations. It has 2,295 RGBD panoramic images from Matterport3D which are extended with ground truth 3D layouts.
The Middlebury 2001 is a stereo dataset of indoor scenes with multiple handcrafted layouts.
Northumberland Dolphin Dataset 2020 (NDD20) is a challenging image dataset annotated for both coarse and fine-grained instance segmentation and categorisation. This dataset, the first release of the NDD, was created in response to the rapid expansion of computer vision into conservation research and the production of field-deployable systems suited to extreme environmental conditions -- an area with few open source datasets. NDD20 contains a large collection of above and below water images of two different dolphin species for traditional coarse and fine-grained segmentation.
SAMRS is a remote sensing segmentation dataset which provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination.
Satlas is a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and 7 label modalities.
SpaceNet 1: Building Detection v1 is a dataset for building footprint detection. The data is comprised of 382,534 building footprints, covering an area of 2,544 sq. km of 3/8 band WorldView-2 imagery (0.5 m pixel res.) across the city of Rio de Janeiro, Brazil. The images are processed as 200m×200m tiles with associated building footprint vectors for training.
4 PAPERS • 2 BENCHMARKS
Swiss3DCities is a dataset that is manually annotated for semantic segmentation with per-point labels, and is built using photogrammetry from images acquired by multirotors equipped with high-resolution cameras.
TAS500 is a semantic segmentation dataset for autonomous driving in unstructured environments. TAS500 offers fine-grained vegetation and terrain classes to learn drivable surfaces and natural obstacles in outdoor scenes effectively.
A large collection of interaction-rich video data which are annotated and analyzed.
AeroRIT is a hyperspectral dataset to facilitate aerial hyperspectral scene understanding.
3 PAPERS • NO BENCHMARKS YET
The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.
DeepSportradar is a benchmark suite of computer vision tasks, datasets and benchmarks for automated sport understanding. DeepSportradar currently supports four challenging tasks related to basketball: ball 3D localization, camera calibration, player instance segmentation and player re-identification. For each of the four tasks, a detailed description of the dataset, objective, performance metrics, and the proposed baseline method are provided.