KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
2,428 PAPERS • 120 BENCHMARKS
TUM RGB-D is an RGB-D dataset. It contains the color and depth images of a Microsoft Kinect sensor along the ground-truth trajectory of the sensor. The data was recorded at full frame rate (30 Hz) and sensor resolution (640x480). The ground-truth trajectory was obtained from a high-accuracy motion-capture system with eight high-speed tracking cameras (100 Hz).
141 PAPERS • NO BENCHMARKS YET
Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation.
99 PAPERS • 1 BENCHMARK
The Event-Camera Dataset is a collection of datasets with an event-based camera for high-speed robotics. The data also include intensity images, inertial measurements, and ground truth from a motion-capture system. An event-based camera is a revolutionary vision sensor with three key advantages: a measurement rate that is almost 1 million times faster than standard cameras, a latency of 1 microsecond, and a high dynamic range of 130 decibels (standard cameras only have 60 dB). These properties enable the design of a new class of algorithms for high-speed robotics, where standard cameras suffer from motion blur and high latency. All the data are released both as text files and binary (i.e., rosbag) files.
33 PAPERS • NO BENCHMARKS YET
A dataset for robot navigation task and more. The data is collected in photo-realistic simulation environments in the presence of various light conditions, weather and moving objects.
32 PAPERS • NO BENCHMARKS YET
Virtual KITTI 2 is an updated version of the well-known Virtual KITTI dataset which consists of 5 sequence clones from the KITTI tracking benchmark. In addition, the dataset provides different variants of these sequences such as modified weather conditions (e.g. fog, rain) or modified camera configurations (e.g. rotated by 15◦). For each sequence we provide multiple sets of images containing RGB, depth, class segmentation, instance segmentation, flow, and scene flow data. Camera parameters and poses as well as vehicle locations are available as well. In order to showcase some of the dataset’s capabilities, we ran multiple relevant experiments using state-of-the-art algorithms from the field of autonomous driving. The dataset is available for download at https://europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds.
20 PAPERS • 1 BENCHMARK
The New College Data is a freely available dataset collected from a robot completing several loops outdoors around the New College campus in Oxford. The data includes odometry, laser scan, and visual information. The dataset URL is not working anymore.
16 PAPERS • NO BENCHMARKS YET
TUM monoVO is a dataset for evaluating the tracking accuracy of monocular Visual Odometry (VO) and SLAM methods. It contains 50 real-world sequences comprising over 100 minutes of video, recorded across different environments – ranging from narrow indoor corridors to wide outdoor scenes. All sequences contain mostly exploring camera motion, starting and ending at the same position: this allows to evaluate tracking accuracy via the accumulated drift from start to end, without requiring ground-truth for the full sequence. In contrast to existing datasets, all sequences are photometrically calibrated: the dataset creators provide the exposure times for each frame as reported by the sensor, the camera response function and the lens attenuation factors (vignetting).
10 PAPERS • NO BENCHMARKS YET
4Seasons is adataset covering seasonal and challenging perceptual conditions for autonomous driving.
9 PAPERS • NO BENCHMARKS YET
The endoscopic SLAM dataset (EndoSLAM) is a dataset for depth estimation approach for endoscopic videos. It consists of both ex-vivo and synthetically generated data. The ex-vivo part of the dataset includes standard as well as capsule endoscopy recordings. The dataset is divided into 35 sub-datasets. Specifically, 18, 5 and 12 sub-datasets exist for colon, small intestine and stomach respectively.
2 PAPERS • NO BENCHMARKS YET
A novel dataset with a diverse set of sequences in different scenes for evaluating VI odometry. It provides camera images with 1024x1024 resolution at 20 Hz, high dynamic range and photometric calibration.
ALTO is a vision-focused dataset for the development and benchmarking of Visual Place Recognition and Localization methods for Unmanned Aerial Vehicles. The dataset is composed of two long (approximately 150km and 260km) trajectories flown by a helicopter over Ohio and Pennsylvania, and it includes high precision GPS-INS ground truth location data, high precision accelerometer readings, laser altimeter readings, and RGB downward facing camera imagery.The dataset also comes with reference imagery over the flight paths, which makes this dataset suitable for VPR benchmarking and other tasks common in Localization, such as image registration and visual odometry.
1 PAPER • NO BENCHMARKS YET
A new underwater dataset that has been recorded in an harbor and provides several sequences with synchronized measurements from a monocular camera, a MEMS-IMU and a pressure sensor.
Brown Pedestrian Odometry Dataset (BPOD) is a dataset for benchmarking visual odometry algorithms in head-mounted pedestrian settings. This dataset was captured using synchronized global and rolling shutter stereo cameras in 12 diverse indoor and outdoor locations on Brown University's campus. Compared to existing datasets, BPOD contains more image blur and self-rotation, which are common in pedestrian odometry but rare elsewhere. Ground-truth trajectories are generated from stick-on markers placed along the pedestrian’s path, and the pedestrian's position is documented using a third-person video.
ConsInv is a stereo RGB + IMU dataset designed for Dynamic SLAM testing and contains two subsets:
This is the dataset for testing the robustness of various VO/VIO methods, acquired on reak UAV.
MinNav is a synthetic dataset based on the sandbox game Minecraft. The dataset uses several plug-in program to generate rendered image sequences with time-aligned depth maps, surface normal maps and camera poses. Thanks for the large game's community, there is an extremely large number of 3D open-world environment, users can find suitable scenes for shooting and build data sets through it and they can also build scenes in-game.
The dataset contains 32 sequences for the evaluation of VI motion estimation methods, totalling ∼80 min of data. The dataset covers challenging conditions (mainly illumination changes and low textured environments) in different degrees and a wide rage of scenarios (including corridors, parking, classrooms, halls, etc.) from two different buildings at the University of Malaga. In general, we provide at least two different sequences within the same scenario, with different illumination conditions or following different trajectories. All sequences were recorded with our VI sensor handheld, except a few that were recorded while mounted in a car.
0 PAPER • NO BENCHMARKS YET