KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
3,169 PAPERS • 139 BENCHMARKS
The Middlebury 2014 dataset contains a set of 23 high resolution stereo pairs for which known camera calibration parameters and ground truth disparity maps obtained with a structured light scanner are available. The images in the Middlebury dataset all show static indoor scenes with varying difficulties including repetitive structures, occlusions, wiry objects as well as untextured areas.
50 PAPERS • 2 BENCHMARKS
Spring is a large, high-resolution and high-detail, computer-generated benchmark for scene flow, optical flow, and stereo. Based on rendered scenes from the open-source Blender movie "Spring", it provides photo-realistic HD datasets with state-of-the-art visual effects and ground truth training data.
20 PAPERS • 3 BENCHMARKS
We provide all the expected data inputs to GUISS such as meshes, texture images, and blend files. Generated datasets used in our experiments along with the stereo depth estimations can be downloaded. We have defined seven dataset types: scene_reconstructions, texture_variation, gaea_texture_variation, generative_texture, terrain_variation, rocks, and generative_texture_snow. Each dataset type contains renderings with varying values of different parameters such as lighting angle, texture imgs, albedo, etc. Position each dataset type folder under data/dataset/.
1 PAPER • NO BENCHMARKS YET
The Sentinel-2 satellite carries 12 CMOS detectors for the VNIR bands, with adjacent detectors having overlapping fields of view that result in overlapping regions in level-1 B (L1B) images. This dataset includes 3740 pairs of overlapping image crops extracted from two L1B products. Each crop has a height of around 400 pixels and a variable width that depends on the overlap width between detectors for RGBN bands, typically around 120-200 pixels. In addition to detector parallax, there is also cross-band parallax for each detector, resulting in shifts between bands. Pre-registration is performed for both cross-band and cross-detector parallax, with a precision of up to a few pixels (typically less than 10 pixels).