The NCLT dataset is a large scale, long-term autonomy dataset for robotics research collected on the University of Michigan’s North Campus. The dataset consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and proprioceptive sensors for odometry collected using a Segway robot. The dataset was collected to facilitate research focusing on long-term autonomous operation in changing environments. The dataset is comprised of 27 sessions spaced approximately biweekly over the course of 15 months. The sessions repeatedly explore the campus, both indoors and outdoors, on varying trajectories, and at different times of the day across all four seasons. This allows the dataset to capture many challenging elements including: moving obstacles (e.g., pedestrians, bicyclists, and cars), changing lighting, varying viewpoint, seasonal and weather changes (e.g., falling leaves and snow), and long-term structural changes caused by construction projects.
123 PAPERS • NO BENCHMARKS YET
Cambridge Landmarks, a large scale outdoor visual relocalisation dataset taken around Cambridge University. Contains original video, with extracted image frames labelled with their 6-DOF camera pose and a visual reconstruction of the scene. If you use this data, please cite our paper: Alex Kendall, Matthew Grimes and Roberto Cipolla "PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization." Proceedings of the International Conference on Computer Vision (ICCV), 2015.
109 PAPERS • NO BENCHMARKS YET
Aachen Day-Night is a dataset designed for benchmarking 6DOF outdoor visual localization in changing conditions. It focuses on localizing high-quality night-time images against a day-time 3D model. There are 14,607 images with changing conditions of weather, season and day-night cycles.
92 PAPERS • 1 BENCHMARK
Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs (a) created by professional designers with a variety of ground truth 3D structure annotations (b) and generate photo-realistic 2D images (c). The dataset consists of rendering images and corresponding ground truth annotations (e.g., semantic, albedo, depth, surface normal, layout) under different lighting and furniture configurations.
84 PAPERS • 1 BENCHMARK
InLoc is a dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario.
67 PAPERS • 1 BENCHMARK
The Oxford Radar RobotCar Dataset is a radar extension to The Oxford RobotCar Dataset. It has been extended with data from a Navtech CTS350-X Millimetre-Wave FMCW radar and Dual Velodyne HDL-32E LIDARs with optimised ground truth radar odometry for 280 km of driving around Oxford, UK (in addition to all sensors in the original Oxford RobotCar Dataset).
28 PAPERS • 4 BENCHMARKS
This data set provides Light Detection and Ranging (LiDAR) data and stereo image with various position sensors targeting a highly complex urban environment. The presented data set captures features in urban environments (e.g. metropolis areas, complex buildings and residential areas). The data of 2D and 3D LiDAR are provided, which are typical types of LiDAR sensors. Raw sensor data for vehicle navigation is presented in a file format. For convenience, development tools are provided in the Robot Operating System (ROS) environment.
19 PAPERS • NO BENCHMARKS YET
The Zillow Indoor Dataset (ZInD) provides extensive visual data that covers a real world distribution of unfurnished residential homes. It consists of primary 360º panoramas with annotated room layouts, windows, doors and openings (W/D/O), merged rooms, secondary localized panoramas, and final 2D floor plans. The figure above illustrates the various representations (from left to right beyond capture): Room layout with W/D/O annotations, merged layouts, 3D textured mesh, and final 2D floor plan.
18 PAPERS • 1 BENCHMARK
A novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud [12] in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e., building, vegetation and ground) to refined (i.e., window, door and tree) elements.
15 PAPERS • NO BENCHMARKS YET
DeepLoc is a large-scale urban outdoor localization dataset. The dataset is currently comprised of one scene spanning an area of 110 x 130 m, that a robot traverses multiple times with different driving patterns. The dataset creators use a LiDAR-based SLAM system with sub-centimeter and sub-degree accuracy to compute the pose labels that provided as groundtruth. Poses in the dataset are approximately spaced by 0.5 m which is twice as dense as other relocalization datasets.
8 PAPERS • NO BENCHMARKS YET
Aachen Day-Night v1.1 dataset is an extended version of the original Aachen Day-Night dataset. Besides the original query images, the Aachen Day-Night v1.1 dataset contains an additional 93 nighttime queries. In addition, it uses a larger 3D model containing additional images. These additional images were extracted from video sequences captured with different cameras. Please refer to Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis for more information.
7 PAPERS • 1 BENCHMARK
The Indoor-6 dataset was created from multiple sessions captured in six indoor scenes over multiple days. The pseudo ground truth (pGT) 3D point clouds and camera poses for each scene are computed using COLMAP. All training data uses only colmap reconstruction from training images. Compared to 7-scenes, the scenes in Indoor-6 are larger, have multiple rooms, contains illumination variations as the images span multiple days and different times of day.
3 PAPERS • NO BENCHMARKS YET
Long-term visual localization provides a benchmark datasets aimed at evaluating 6 DoF pose estimation accuracy over large appearance variations caused by changes in seasonal (summer, winter, spring, etc.) and illumination (dawn, day, sunset, night) conditions. Each dataset consists of a set of reference images, together with their corresponding ground truth poses, and a set of query images.
The PhotoSynth (PS) dataset for patch matching consists of a total of 30 scenes with 25 scenes for training and 5 scenes for validation. The different image pairs are captured in different illumination conditions, at different scales and with different viewpoints.
Aa new cross-season scaleless monocular depth prediction dataset from CMU Visual Localization dataset through structure from motion.
Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques. We propose utilising graph representations to model sequences of local observations and the connectivity of the target location. Modelling as a graph enables generating previously unseen sequences by sampling with new parameter configurations. SpaGBOL contains 98,855 panoramic streetview images across different seasons, and 19,771 corresponding satellite images from 10 mostly densely populated international cities. This translates to 5 panoramic images and one satellite image per graph node. Downloading instructions below.
3 PAPERS • 1 BENCHMARK
This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi
The Virtual Gallery dataset is a synthetic dataset that targets multiple challenges such as varying lighting conditions and different occlusion levels for various tasks such as depth estimation, instance segmentation and visual localization.
2 PAPERS • NO BENCHMARKS YET
To study the data-scarcity mitigation for learning-based visual localization methods via sim-to-real transfer, we curate and now present the CrossLoc benchmark datasets—a multimodal aerial sim-to-real data available for flights above nature and urban terrains. Unlike the previous computer vision datasets focusing on localization in a single domain (mostly real RGB images), the provided benchmark datasets include various multimodal synthetic cues paired to all real photos. Complementary to the paired real and synthetic data, we offer rich synthetic data that efficiently fills the flight envelope volume in the vicinity of the real data.
1 PAPER • NO BENCHMARKS YET
Danish Airs and Grounds (DAG) is a large collection of street-level and aerial images targeting such cases. Its main challenge lies in the extreme viewing-angle difference between query and reference images with consequent changes in illumination and perspective. The dataset is larger and more diverse than current publicly available data, including more than 50 km of road in urban, suburban and rural areas. All images are associated with accurate 6-DoF metadata that allows the benchmarking of visual localization methods.
DeepLocCross is a localization dataset that contains RGB-D stereo images captured at 1280 x 720 pixels at a rate of 20 Hz. The ground-truth pose labels are generated using a LiDAR-based SLAM system. In addition to the 6-DoF localization poses of the robot, the dataset additionally contains tracked detections of the observable dynamic objects. Each tracked object is identified using a unique track ID, spatial coordinates, velocity and orientation angle. Furthermore, as the dataset contains multiple pedestrian crossings, labels at each intersection indicating its safety for crossing are provided. This dataset consists of seven training sequences with a total of 2264 images, and three testing sequences with a total of 930 images. The dynamic nature of the surrounding environment at which the dataset was captured renders the tasks of localization and visual odometry estimation extremely challenging due to the varying weather conditions, presence of shadows and motion blur caused by the mov
FAS100K is a large-scale visual localization dataset. This dataset is comprised of two traverses of 238 and 130 kms respectively where the latter is a partial repeat of the former. The data was collected using stereo cameras in Australia under sunny day conditions. It covers a variety of road and environment types including urban and rural areas. The raw image data from one of the cameras streaming at 5 Hz constitutes 63,650 and 34,497 image frames for the two traverses respectively.
A new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images.
The NAVER LABS localization datasets are 5 new indoor datasets for visual localization in challenging real-world environments. They were captured in a large shopping mall and a large metro station in Seoul, South Korea, using a dedicated mapping platform consisting of 10 cameras and 2 laser scanners. In order to obtain accurate ground truth camera poses, we used a robust LiDAR SLAM which provides initial poses that are then refined using a novel structure-from-motion based optimization. The datasets are provided in the kapture format and contain about 130k images as well as 6DoF camera poses for training and validation. We also provide sparse Lidar-based depth maps for the training images. The poses of the test set are withheld to not bias the benchmark.
This dataset builds upon the SpaGBOL dataset - a graph-based dataset covering numerous cities across the globe for the purpose of structured city-scale Cross-View Geo-Localisation (CVGL).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
This synthetic event dataset is used in Robust e-NeRF to study the collective effect of camera speed profile, contrast threshold variation and refractory period on the quality of NeRF reconstruction from a moving event camera. It is simulated using an improved version of ESIM with three different camera configurations of increasing difficulty levels (i.e. easy, medium and hard) on seven Realistic Synthetic $360^{\circ}$ scenes (adopted in the synthetic experiments of NeRF), resulting in a total of 21 sequence recordings. Please refer to the Robust e-NeRF paper for more details.