This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi
2 PAPERS • NO BENCHMARKS YET
Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. To tackle this issue with a common benchmark, we introduce the Drunkard’s Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality.
1 PAPER • 1 BENCHMARK
ConsInv is a stereo RGB + IMU dataset designed for Dynamic SLAM testing and contains two subsets:
1 PAPER • NO BENCHMARKS YET
TUM monoVO is a dataset for evaluating the tracking accuracy of monocular Visual Odometry (VO) and SLAM methods. It contains 50 real-world sequences comprising over 100 minutes of video, recorded across different environments – ranging from narrow indoor corridors to wide outdoor scenes. All sequences contain mostly exploring camera motion, starting and ending at the same position: this allows to evaluate tracking accuracy via the accumulated drift from start to end, without requiring ground-truth for the full sequence. In contrast to existing datasets, all sequences are photometrically calibrated: the dataset creators provide the exposure times for each frame as reported by the sensor, the camera response function and the lens attenuation factors (vignetting).
11 PAPERS • NO BENCHMARKS YET
This dataset consists of odometer or speedometer images of bike and car vehicles.
0 PAPER • NO BENCHMARKS YET
The endoscopic SLAM dataset (EndoSLAM) is a dataset for depth estimation approach for endoscopic videos. It consists of both ex-vivo and synthetically generated data. The ex-vivo part of the dataset includes standard as well as capsule endoscopy recordings. The dataset is divided into 35 sub-datasets. Specifically, 18, 5 and 12 sub-datasets exist for colon, small intestine and stomach respectively.
3 PAPERS • NO BENCHMARKS YET