KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome
3,588 PAPERS • 143 BENCHMARKS
The nuScenes dataset is a large-scale autonomous driving dataset. The dataset has 3D bounding boxes for 1000 scenes collected in Boston and Singapore. Each scene is 20 seconds long and annotated at 2Hz. This results in a total of 28130 samples for training, 6019 samples for validation and 6008 samples for testing. The dataset has the full autonomous vehicle data suite: 32-beam LiDAR, 6 cameras and radars with complete 360° coverage. The 3D object detection challenge evaluates the performance on 10 classes: cars, trucks, buses, trailers, construction vehicles, pedestrians, motorcycles, bicycles, traffic cones and barriers.
2,025 PAPERS • 22 BENCHMARKS
SemanticKITTI is a large-scale outdoor-scene dataset for point cloud semantic segmentation. It is derived from the KITTI Vision Odometry Benchmark which it extends with dense point-wise annotations for the complete 360 field-of-view of the employed automotive LiDAR. The dataset consists of 22 sequences. Overall, the dataset provides 23201 point clouds for training and 20351 for testing.
637 PAPERS • 10 BENCHMARKS
The Waymo Open Dataset is comprised of high resolution sensor data collected by autonomous vehicles operated by the Waymo Driver in a wide variety of conditions.
466 PAPERS • 16 BENCHMARKS
Argoverse is a tracking benchmark with over 30K scenarios collected in Pittsburgh and Miami. Each scenario is a sequence of frames sampled at 10 HZ. Each sequence has an interesting object called “agent”, and the task is to predict the future locations of agents in a 3 seconds future horizon. The sequences are split into training, validation and test sets, which have 205,942, 39,472 and 78,143 sequences respectively. These splits have no geographical overlap.
375 PAPERS • 6 BENCHMARKS
KITTI-360 is a large-scale dataset that contains rich sensory information and full annotations. It is the successor of the popular KITTI dataset, providing more comprehensive semantic/instance labels in 2D and 3D, richer 360 degree sensory information (fisheye images and pushbroom laser scans), very accurate and geo-localized vehicle and camera poses, and a series of new challenging benchmarks.
228 PAPERS • 7 BENCHMARKS
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions be- tween the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion
175 PAPERS • 3 BENCHMARKS
ONCE (One millioN sCenEs) is a dataset for 3D object detection in the autonomous driving scenario. The ONCE dataset consists of 1 million LiDAR scenes and 7 million corresponding camera images. The data is selected from 144 driving hours, which is 20x longer than other 3D autonomous driving datasets available like nuScenes and Waymo, and it is collected across a range of different areas, periods and weather conditions.
80 PAPERS • 1 BENCHMARK
The SemanticPOSS dataset for 3D semantic segmentation contains 2988 various and complicated LiDAR scans with large quantity of dynamic instances. The data is collected in Peking University and uses the same data format as SemanticKITTI.
70 PAPERS • 2 BENCHMARKS
The Newer College Dataset is a large dataset with a variety of mobile mapping sensors collected using a handheld device carried at typical walking speeds for nearly 2.2 km through New College, Oxford. The dataset includes data from two commercially available devices - a stereoscopic-inertial camera and a multi-beam 3D LiDAR, which also provides inertial measurements. Additionally, the authors used a tripod-mounted survey grade LiDAR scanner to capture a detailed millimeter-accurate 3D map of the test location (containing ∼290 million points).
64 PAPERS • NO BENCHMARKS YET
A large-scale V2X perception dataset using CARLA and OpenCDA
36 PAPERS • 1 BENCHMARK
V2V4Real is a large-scale real-world multi-modal dataset for V2V perception. The data is collected by two vehicles equipped with multi modal sensors driving together through diverse scenarios. It covers a driving area of 410 km comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps that cover all the driving routes.
33 PAPERS • NO BENCHMARKS YET
We introduce an object detection dataset in challenging adverse weather conditions covering 12000 samples in real-world driving scenes and 1500 samples in controlled weather conditions within a fog chamber. The dataset includes different weather conditions like fog, snow, and rain and was acquired by over 10,000 km of driving in northern Europe. The driven route with cities along the road is shown on the right. In total, 100k Objekts were labeled with accurate 2D and 3D bounding boxes. The main contributions of this dataset are: - We provide a proving ground for a broad range of algorithms covering signal enhancement, domain adaptation, object detection, or multi-modal sensor fusion, focusing on the learning of robust redundancies between sensors, especially if they fail asymmetrically in different weather conditions. - The dataset was created with the initial intention to showcase methods, which learn of robust redundancies between the sensor and enable a raw data sensor fusion in cas
32 PAPERS • 2 BENCHMARKS
The Multi Vehicle Stereo Event Camera (MVSEC) dataset is a collection of data designed for the development of novel 3D perception algorithms for event based cameras. Stereo event data is collected from car, motorbike, hexacopter and handheld data, and fused with lidar, IMU, motion capture and GPS to provide ground truth pose and depth images.
27 PAPERS • 2 BENCHMARKS
We present the Dayton Annotated LiDAR Earth Scan (DALES) data set, a new large-scale aerial LiDAR data set with over a half-billion hand-labeled points spanning 10 square kilometers of area and eight object categories. Large annotated point cloud data sets have become the standard for evaluating deep learning methods. However, most of the existing data sets focus on data collected from a mobile or terrestrial scanner with few focusing on aerial data. Point cloud data collected from an Aerial Laser Scanner (ALS) presents a new set of challenges and applications in areas such as 3D urban modeling and large-scale surveillance. DALES is the most extensive publicly available ALS data set with over 400 times the number of points and six times the resolution of other currently available annotated aerial point cloud data sets. This data set gives a critical number of expert verified hand-labeled points for the evaluation of new 3D deep learning algorithms, helping to expand the focus of curren
26 PAPERS • 2 BENCHMARKS
18 PAPERS • 2 BENCHMARKS
SynLiDAR is a large-scale synthetic LiDAR sequential point cloud dataset with point-wise annotations. 13 sequences of LiDAR point cloud with around 20k scans (over 19 billion points and 32 semantic classes) are collected from virtual urban cities, suburban towns, neighborhood, and harbor.
17 PAPERS • 2 BENCHMARKS
DELIVER is an arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, the dataset is also used in four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. It is designed for the tasks of arbitrary-modal semantic segmentation.
16 PAPERS • 3 BENCHMARKS
Have you wondered how autonomous mobile robots should share space with humans in public spaces? Are you interested in developing autonomous mobile robots that can navigate within human crowds in a socially compliant manner? Do you want to analyze human reactions and behaviors in the presence of mobile robots of different morphologies?
16 PAPERS • NO BENCHMARKS YET
A dataset capturing diverse visual data formats that target varying luminance conditions, and was recorded from alternative vision sensors, by handheld or mounted on a car, repeatedly in the same space but in different conditions.
A novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud [12] in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e., building, vegetation and ground) to refined (i.e., window, door and tree) elements.
15 PAPERS • NO BENCHMARKS YET
Many existing datasets for lidar place recognition are solely representative of structured urban environments, and have recently been saturated in performance by deep learning based approaches. Natural and unstructured environments present many additional challenges for the tasks of long-term localisation but these environments are not represented in currently available datasets. To address this we introduce Wild-Places, a challenging large-scale dataset for lidar place recognition in unstructured, natural environments. Wild-Places contains eight lidar sequences collected with a handheld sensor payload over the course of fourteen months, containing a total of 63K undistorted lidar submaps along with accurate 6DoF ground truth. This dataset contains multiple revisits both within and between sequences, allowing for both intra-sequence (i.e., loop closure detection) and inter-sequence (i.e., re-localisation) tasks. We also benchmark several state-of-the-art approaches to demonstrate t
14 PAPERS • NO BENCHMARKS YET
SLOPER4D is a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. It consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 (up to 13,000), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE.
11 PAPERS • 1 BENCHMARK
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detecti
11 PAPERS • 2 BENCHMARKS
SemanticSTF is an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. It contains 2,076 scans captured by a Velodyne HDL64 S3D LiDAR sensor from STF that cover various adverse weather conditions including 694 snowy, 637 dense-foggy, 631 light-foggy, and 114 rainy (all rainy LiDAR scans in STF).
10 PAPERS • 1 BENCHMARK
Collected in the snow belt region of Michigan's Upper Peninsula, WADS is the first multi-modal dataset featuring dense point-wise labeled sequential LiDAR scans collected in severe winter weather.
8 PAPERS • NO BENCHMARKS YET
The Zenseact Open Dataset (ZOD) is a large-scale and diverse multi-modal autonomous driving (AD) dataset, created by researchers at Zenseact. It was collected over a 2-year period in 14 different European counties, using a fleet of vehicles equipped with a full sensor suite. The dataset consists of three subsets: Frames, Sequences, and Drives, designed to encompass both data diversity and support for spatiotemporal learning, sensor fusion, localization, and mapping.
BAAI-VANJEE is a dataset for benchmarking and training various computer vision tasks such as 2D/3D object detection and multi-sensor fusion. The BAAI-VANJEE roadside dataset consists of LiDAR data and RGB images collected by VANJEE smart base station placed on the roadside about 4.5m high. This dataset contains 2500 frames of LiDAR data, 5000 frames of RGB images, including 20% collected at the same time. It also contains 12 classes of objects, 74K 3D object annotations and 105K 2D object annotations.
6 PAPERS • NO BENCHMARKS YET
Tasks. In moving object segmentation of point cloud sequences, one has to provide motion labels for each point of the test sequences 11-21. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label for each point of a scan, i.e., one full turn of the rotating LiDAR sensor. Here, we only distinguish between static and moving object classes.
DurLAR is a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery for multi-modal autonomous driving applications. Compared to existing autonomous driving task datasets, DurLAR has the following novel features:
5 PAPERS • NO BENCHMARKS YET
SYNS-Patches dataset, which is a subset of SYNS. The original SYNS is composed of aligned image and LiDAR panoramas from 92 different scenes belonging to a wide variety of environments, such as Agriculture, Natural (e.g. forests and fields), Residential, Industrial and Indoor. It represents the subset of patches from each scene extracted at eye level at 20 degree intervals of a full horizontal rotation. This results in 18 images per scene and a total dataset size of 1656.
The Argoverse 2 Lidar Dataset is a collection of 20,000 scenarios with lidar sensor data, HD maps, and ego-vehicle pose. It does not include imagery or 3D annotations. The dataset is designed to support research into self-supervised learning in the lidar domain, as well as point cloud forecasting.
4 PAPERS • NO BENCHMARKS YET
The FieldSAFE dataset is a multi-modal dataset for obstacle detection in agriculture. It comprises 2 hours of raw sensor data from a tractor-mounted sensor system in a grass mowing scenario in Denmark, October 2016.
4 PAPERS • 1 BENCHMARK
Human-M3 is an outdoor multi-modal multi-view multi-person human pose database which includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds.
MUSES offers 2500 multi-modal scenes, evenly distributed across various combinations of weather conditions (clear, fog, rain, and snow) and types of illumination (daytime, nighttime). Each image includes high-quality 2D pixel-level panoptic annotations and class-level and novel instance-level uncertainty annotations. Further, each adverse-condition image has a corresponding image of the same scene taken under clear-weather, daytime conditions. The annotation process for MUSES utilizes all available sensor data, allowing the annotators to also reliably label degraded image regions that are still discernible in other modalities. This results in better pixel coverage in the annotations and creates a more challenging evaluation setup.
4 PAPERS • 5 BENCHMARKS
The Situated Corpus Of Understanding Transactions (SCOUT) is a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multi-phased Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. Each dialogue involved a human Commander, a Dialogue Manager (DM), and a Robot Navigator (RN), and took place in physical or simulated environments.
This dataset contains a variety of common urban road objects scanned with a Velodyne HDL-64E LIDAR, collected in the CBD of Sydney, Australia. There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees.
aiMotive dataset is a multimodal dataset for robust autonomous driving with long-range perception. The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view. The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain and is annotated with 3D bounding boxes with consistent identifiers across frames.
AnoVox is a large-scale benchmark for ANOmaly detection in autonomous driving. AnoVox incorporates multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. AnoVox contains both content and temporal anomalies.
3 PAPERS • NO BENCHMARKS YET
Ford Campus Vision and Lidar Data Set is a dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system.
Large-scale and open-access LiDAR dataset intended for the evaluation of real-time semantic segmentation algorithms. In contrast to other large-scale datasets, HelixNet includes fine-grained data about the sensor's rotation and position, as well as the points' release time.
3 PAPERS • 1 BENCHMARK
Evaluate radar localization in diverse environments Download: https://drive.google.com/drive/folders/1uATfrAe-KHlz29e-Ul8qUbUKwPxBFIhP Download
The TbV dataset is large-scale dataset created to allow the community to improve the state of the art in machine learning tasks related to mapping, that are vital for self-driving.
This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi
WildScenes is a bi-modal benchmark dataset consisting of multiple large-scale, sequential traversals in natural environments, including semantic annotations in high-resolution 2D images and dense 3D LiDAR point clouds, and accurate 6-DoF pose information. The data is (1) trajectory-centric with accurate localization and globally aligned point clouds, (2) calibrated and synchronized to support bi-modal training and inference, and (3) containing different natural environments over 6 months to support research on domain adaptation. We introduce benchmarks on 2D and 3D semantic segmentation and evaluate a variety of recent deep-learning techniques to demonstrate the challenges in semantic segmentation in natural environments. We propose train-val-test splits for standard benchmarks as well as domain adaptation benchmarks and utilize an automated split generation technique to ensure the balance of class label distributions. The WildScenes benchmark webpage is https://csiro-robotics.github.i
Our trajectory dataset consists of camera-based images, LiDAR scanned point clouds, and manually annotated trajectories. It is collected under various lighting conditions and traffic densities in Beijing, China. More specifically, it contains highly complicated traffic flows mixed with vehicles, riders, and pedestrians.
2 PAPERS • 1 BENCHMARK
The Argoverse 2 Map Change Dataset is a collection of 1,000 scenarios with ring camera imagery, lidar, and HD maps. Two hundred of the scenarios include changes in the real-world environment that are not yet reflected in the HD map, such as new crosswalks or repainted lanes. By sharing a map dataset that labels the instances in which there are discrepancies with sensor data, we encourage the development of novel methods for detecting out-of-date map regions.
2 PAPERS • NO BENCHMARKS YET