KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
2,428 PAPERS • 120 BENCHMARKS
The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect. It features:
601 PAPERS • 17 BENCHMARKS
The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside 90 real building-scale scenes, constructed from 194,400 RGB-D images. Each scene is a residential building consisting of multiple rooms and floor levels, and is annotated with surface construction, camera poses, and semantic segmentation.
268 PAPERS • 4 BENCHMARKS
The dataset was collected using the Intel RealSense D435i camera, which was configured to produce synchronized accelerometer and gyroscope measurements at 400 Hz, along with synchronized VGA-size (640 x 480) RGB and depth streams at 30 Hz. The depth frames are acquired using active stereo and is aligned to the RGB frame using the sensor factory calibration. All the measurements are timestamped.
11 PAPERS • 1 BENCHMARK
The KITTI-Depth dataset includes depth maps from projected LiDAR point clouds that were matched against the depth estimation from the stereo cameras. The depth images are highly sparse with only 5% of the pixels available and the rest is missing. The dataset has 86k training images, 7k validation images, and 1k test set images on the benchmark server with no access to the ground truth.
10 PAPERS • NO BENCHMARKS YET
TransCG is the first large-scale real-world dataset for transparent object depth completion and grasping, which contains 57,715 RGB-D images of 51 transparent objects and many opaque objects captured from different perspectives (~240 viewpoints) of 130 scenes under real-world settings. The samples are captured by two different types of cameras (Realsense D435 & L515).
2 PAPERS • 1 BENCHMARK
Bosch Industrial Depth Completion Dataset (BIDCD) is an RGBD dataset for of static table-top scenes with industrial objects. The data was collected with a RealSense depth-camera mounted on a robotic arm, i.e. from multiple Points-of-View (POV), approximately 60 for each scene. We generated depth ground truth with a customized pipeline for removing erroneous depth values, and applied Multi-View geometry to fuse the cleaned depth frames and fill-in missing information. The fused scene mesh was back-projected to each POV, and finally a bi-lateral filter was applied to reduce the remaining holes.
1 PAPER • NO BENCHMARKS YET
PLAD is a dataset where sparse depth is provided by line-based visual SLAM to verify StructMDC.
1 PAPER • 1 BENCHMARK
SuperCaustics is a simulation tool made in Unreal Engine for generating massive computer vision datasets that include transparent objects.