For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. Hypersim is a photorealistic synthetic dataset for holistic indoor scene understanding. It contains 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.
61 PAPERS • 1 BENCHMARK
IDD is a dataset for road scene understanding in unstructured environments used for semantic segmentation and object detection for autonomous driving. It consists of 10,004 images, finely annotated with 34 classes collected from 182 drive sequences on Indian roads.
86 PAPERS • NO BENCHMARKS YET
A three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
16 PAPERS • NO BENCHMARKS YET
Consists of user-generated aerial videos from social media with annotations of instance-level building damage masks. This provides the first benchmark for quantitative evaluation of models to assess building damage using aerial videos.
3 PAPERS • NO BENCHMARKS YET
The data set contains 38 patches (of the same size), each consisting of a true orthophoto (TOP) extracted from a larger TOP mosaic.
21 PAPERS • 1 BENCHMARK
The data set contains 33 patches (of different sizes), each consisting of a true orthophoto (TOP) extracted from a larger TOP mosaic.
15 PAPERS • 1 BENCHMARK
The Image and Video Advertisements collection consists of an image dataset of 64,832 image ads, and a video dataset of 3,477 ads. The data contains rich annotations encompassing the topic and sentiment of the ads, questions and answers describing what actions the viewer is prompted to take and the reasoning that the ad presents to persuade the viewer ("What should I do according to this ad, and why should I do it? "), and symbolic references ads make (e.g. a dove symbolizes peace).
InteriorNet is a RGB-D for large scale interior scene understanding and mapping. The dataset contains 20M images created by pipeline:
26 PAPERS • NO BENCHMARKS YET
Consists of annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc. Beside of the images, the dataset includes ground truth masks and bounding boxes and has been verified by two expert GI endoscopists.
14 PAPERS • 3 BENCHMARKS
Kvasir-SEG is an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist.
140 PAPERS • 3 BENCHMARKS
Lost and Found is a novel lost-cargo image sequence dataset comprising more than two thousand frames with pixelwise annotations of obstacle and free-space and provide a thorough comparison to several stereo-based baseline methods. The dataset will be made available to the community to foster further research on this important topic.
46 PAPERS • 1 BENCHMARK
MJU-Waste is an RGBD waste object segmentation dataset that is made public to facilitate future research in this area.
4 PAPERS • 1 BENCHMARK
The Multiple Light Source dataset (MLS) is a collection of 24 multiple object scenes each recorded under 18 multiple light source illumination scenarios. The illuminants are varying in dominant spectral colours, intensity and distance from the scene. The dataset can be used for the evaluation of computational colour constancy algorithms. Along with the images of the scenes the spectral characteristics of the camera, light sources and the objects are also provided, and each image includes pixel-by-pixel ground truth annotation of uniformly coloured object surfaces thus making this useful for benchmarking colour-based image segmentation algorithms.
1 PAPER • NO BENCHMARKS YET
MVTec D2S is a benchmark for instance-aware semantic segmentation in an industrial domain. It contains 21,000 high-resolution images with pixel-wise labels of all object instances. The objects comprise groceries and everyday products from 60 categories. The benchmark is designed such that it resembles the real-world setting of an automatic checkout, inventory, or warehouse system. The training images only contain objects of a single class on a homogeneous background, while the validation and test sets are much more complex and diverse.
2 PAPERS • NO BENCHMARKS YET
MatterportLayout extends the Matterport3D dataset with general Manhattan layout annotations. It has 2,295 RGBD panoramic images from Matterport3D which are extended with ground truth 3D layouts.
5 PAPERS • NO BENCHMARKS YET
MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect patterns, a total of 38 defect patterns.
1 PAPER • 2 BENCHMARKS
ModaNet is a street fashion images dataset consisting of annotations related to RGB images. ModaNet provides multiple polygon annotations for each image. Each polygon is associated with a label from 13 meta fashion categories. The annotations are based on images in the PaperDoll image set, which has only a few hundred images annotated by the superpixel-based tool.
19 PAPERS • 1 BENCHMARK
X-ray images in this data set have been acquired from the tuberculosis control program of the Department of Health andHuman Services of Montgomery County, MD, USA. This set contains 138 posterior-anterior x-rays, of which 80 x-rays are normal and 58 x-rays areabnormal with manifestations of tuberculosis. All images are de-identified and available in DICOM format. The set covers a wide range of abnormalities,including effusions and miliary patterns. The data set includes radiology readings available as a text files and summary of its content
7 PAPERS • 1 BENCHMARK
This dataset consists of four sets of flower images, from three different species: apple, peach, and pear, and accompanying ground truth images. The images were acquired under a range of imaging conditions. These datasets support work in an accompanying paper that demonstrates a flower identification algorithm that is robust to uncontrolled environments and applicable to different flower species. While this data is primarily provided to support that paper, other researchers interested in flower detection may also use the dataset to develop new algorithms. Flower detection is a problem of interest in orchard crops because it is related to management of fruit load.
Northumberland Dolphin Dataset 2020 (NDD20) is a challenging image dataset annotated for both coarse and fine-grained instance segmentation and categorisation. This dataset, the first release of the NDD, was created in response to the rapid expansion of computer vision into conservation research and the production of field-deployable systems suited to extreme environmental conditions -- an area with few open source datasets. NDD20 contains a large collection of above and below water images of two different dolphin species for traditional coarse and fine-grained segmentation.
4 PAPERS • NO BENCHMARKS YET
ODMS is a dataset for learning Object Depth via Motion and Segmentation. ODMS training data are configurable and extensible, with each training example consisting of a series of object segmentation masks, camera movement distances, and ground truth object depth. As a benchmark evaluation, the dataset provides four ODMS validation and test sets with 15,650 examples in multiple domains, including robotics and driving.
Objects365 is a large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. More than 10 million, high-quality bounding boxes are manually labeled through a three-step, carefully designed annotation pipeline. It is the largest object detection dataset (with full annotation) so far and establishes a more challenging benchmark for the community.
134 PAPERS • 3 BENCHMARKS
OpenEDS (Open Eye Dataset) is a large scale data set of eye-images captured using a virtual-reality (VR) head mounted display mounted with two synchronized eyefacing cameras at a frame rate of 200 Hz under controlled illumination. This dataset is compiled from video capture of the eye-region collected from 152 individual participants and is divided into four subsets: (i) 12,759 images with pixel-level annotations for key eye-regions: iris, pupil and sclera (ii) 252,690 unlabelled eye-images, (iii) 91,200 frames from randomly selected video sequence of 1.5 seconds in duration and (iv) 143 pairs of left and right point cloud data compiled from corneal topography of eye regions collected from a subset, 143 out of 152, participants in the study.
24 PAPERS • 1 BENCHMARK
OpenEDS2020 is a dataset of eye-image sequences captured at a frame rate of 100 Hz under controlled illumination, using a virtual-reality head-mounted display mounted with two synchronized eye-facing cameras. The dataset, which is anonymized to remove any personally identifiable information on participants, consists of 80 participants of varied appearance performing several gaze-elicited tasks, and is divided in two subsets: 1) Gaze Prediction Dataset, with up to 66,560 sequences containing 550,400 eye-images and respective gaze vectors, created to foster research in spatio-temporal gaze estimation and prediction approaches; and 2) Eye Segmentation Dataset, consisting of 200 sequences sampled at 5 Hz, with up to 29,500 images, of which 5% contain a semantic segmentation label, devised to encourage the use of temporal information to propagate labels to contiguous frames.
OpenSurfaces is a large database of annotated surfaces created from real-world consumer photographs. The framework used for the annotation process draws on crowdsourcing to segment surfaces from photos, and then annotate them with rich surface properties, including material, texture and contextual information.
0 PAPER • NO BENCHMARKS YET
OxUva is a dataset and benchmark for evaluating single-object tracking algorithms.
34 PAPERS • NO BENCHMARKS YET
PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:
119 PAPERS • 14 BENCHMARKS
PASCAL VOC 2011 is an image segmentation dataset. It contains around 2,223 images for training, consisting of 5,034 objects. Testing consists of 1,111 images with 2,028 objects. In total there are over 5,000 precisely segmented objects for training.
19 PAPERS • 2 BENCHMARKS
SCC Data Set
109 PAPERS • 3 BENCHMARKS
Projection of RibFrac CT dataset to a 2D plane to imitate X-Ray data for a total of 880 images with multi-label segmentation masks. The dataset contains fine-grained 92 individual labels of anatomical structures, which, when including super-classes, lead to a total of 166 labels in both lateral and frontal view.
PST900 is a dataset of 894 synchronized and calibrated RGB and Thermal image pairs with per pixel human annotations across four distinct classes from the DARPA Subterranean Challenge.
27 PAPERS • 1 BENCHMARK
PanNuke is a semi automatically generated nuclei instance segmentation and classification dataset with exhaustive nuclei labels across 19 different tissue types. The dataset consists of 481 visual fields, of which 312 are randomly sampled from more than 20K whole slide images at different magnifications, from multiple data sources. In total the dataset contains 205,343 labeled nuclei, each with an instance segmentation mask.
42 PAPERS • 3 BENCHMARKS
The Paris-Lille-3D is a Benchmark on Point Cloud Classification. The Point Cloud has been labeled entirely by hand with 50 different classes. The dataset consists of around 2km of Mobile Laser System point cloud acquired in two cities in France (Paris and Lille).
13 PAPERS • 1 BENCHMARK
PartNet is a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. The dataset consists of 573,585 part instances over 26,671 3D models covering 24 object categories. This dataset enables and serves as a catalyst for many tasks such as shape analysis, dynamic 3D scene modeling and simulation, affordance analysis, and others.
124 PAPERS • 3 BENCHMARKS
A new benchmark dataset of webcam images, Photi-LakeIce, from multiple cameras and two different winters, along with pixel-wise ground truth annotations.
A database of images of approximately 960 unique plants belonging to 12 species at several growth stages is made publicly available. It comprises annotated RGB images with a physical resolution of roughly 10 pixels per mm.
6 PAPERS • NO BENCHMARKS YET
REFUGE Challenge provides a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one.
13 PAPERS • 5 BENCHMARKS
RELLIS-3D is a multi-modal dataset for off-road robotics. It was collected in an off-road environment containing annotations for 13,556 LiDAR scans and 6,235 images. The data was collected on the Rellis Campus of Texas A&M University and presents challenges to existing algorithms related to class imbalance and environmental topography. The dataset also provides full-stack sensor data in ROS bag format, including RGB camera images, LiDAR point clouds, a pair of stereo images, high-precision GPS measurement, and IMU data.
36 PAPERS • 2 BENCHMARKS
The RIT-18 dataset was built for the semantic segmentation of remote sensing imagery. It was collected with the Tetracam Micro-MCA6 multispectral imaging sensor flown on-board a DJI-1000 octocopter.
8 PAPERS • NO BENCHMARKS YET
A dataset which contains over 200 apartments.
A special scene-graph for intelligent vehicles. Different to classical data representation, this graph provides not only object proposals but also their pair-wise relationships. By organizing them in a topological graph, these data are explainable, fully-connected, and could be easily processed by GCNs (Graph Convolutional Networks).
RoadTracer is a dataset for extraction of road networks from aerial images. It consists of a large corpus of high-resolution satellite imagery and ground truth road network graphs covering the urban core of forty cities across six countries. For each city, the dataset covers a region of approximately 24 sq km around the city center. The satellite imagery is obtained from Google at 60 cm/pixel resolution, and the road network from OSM.
37 PAPERS • 2 BENCHMARKS
A dataset with high resolution (4K) images and manually-annotated dense labels every 50 frames.
The SD-198 dataset contains 198 different diseases from different types of eczema, acne and various cancerous conditions. There are 6,584 images in total. A subset include the classes with more than 20 image samples, namely SD-128."
6 PAPERS • 6 BENCHMARKS
A dataset consisting of 180,662 triplets of dual-pol synthetic aperture radar (SAR) image patches, multi-spectral Sentinel-2 image patches, and MODIS land cover maps.
29 PAPERS • NO BENCHMARKS YET
SegTHOR (Segmentation of THoracic Organs at Risk) is a dataset dedicated to the segmentation of organs at risk (OARs) in the thorax, i.e. the organs surrounding the tumour that must be preserved from irradiations during radiotherapy. In this dataset, the OARs are the heart, the trachea, the aorta and the esophagus, which have varying spatial and appearance characteristics. The dataset includes 60 3D CT scans, divided into a training set of 40 and a test set of 20 patients, where the OARs have been contoured manually by an experienced radiotherapist.
23 PAPERS • NO BENCHMARKS YET
The SemanticPOSS dataset for 3D semantic segmentation contains 2988 various and complicated LiDAR scans with large quantity of dynamic instances. The data is collected in Peking University and uses the same data format as SemanticKITTI.
57 PAPERS • 2 BENCHMARKS
The SensatUrbat dataset is an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points, which is five times the number of labeled points than the existing largest point cloud dataset. The dataset consists of large areas from two UK cities, covering about 6 km^2 of the city landscape. In the dataset, each 3D point is labeled as one of 13 semantic classes, such as ground, vegetation, car, etc..