KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
3,167 PAPERS • 139 BENCHMARKS
X3D is a dataset containing 15 scenes and covering 4 applications for X-ray 3D reconstruction. More specifically, the X3D dataset includes the scenes of (1) medicine: jaw, leg, chest, foot, abdomen, aneurism, pelvis, pancreas, head (2) biology: carp, bonsai (3) security: box, backpack (4) industry: engine, teapot
7 PAPERS • 2 BENCHMARKS
The UASOL an RGB-D stereo dataset, that contains 160902 frames, filmed at 33 different scenes, each with between 2 k and 10 k frames. The frames show different paths from the perspective of a pedestrian, including sidewalks, trails, roads, etc. The images were extracted from video files with 15 fps at HD2K resolution with a size of 2280 × 1282 pixels. The dataset also provides a GPS geolocalization tag for each second of the sequences and reflects different climatological conditions. It also involved up to 4 different persons filming the dataset at different moments of the day.
3 PAPERS • 1 BENCHMARK
IBL-NeRF Dataset. Contains multi-view images with its intrinsic components.
2 PAPERS • NO BENCHMARKS YET
LLNeRF Dataset is a real-world dataset as a benchmark for model learning and evaluation. To obtain real low-illumination images with real noise distributions, photos are taken at nighttime outdoor scenes or low-light indoor scenes containing diverse objects. Since the ISP operations are device dependent and the noise distributions across devices are also different, the data is collected using a mobile phone camera and a DSLR camera to enrich the diversity of the dataset.
1 PAPER • NO BENCHMARKS YET
Synthetic dataset comprising three different environments for multi-camera dynamic novel view synthesis for soccer. This dataset is made compatible for Nerfstudio, and includes data parsers with various settings to reproduce the settings of our paper "Dynamic NeRFs for Soccer Scenes" and more.