ChangeSim is a dataset aimed at online scene change detection (SCD) and more. The data is collected in photo-realistic simulation environments with the presence of environmental non-targeted variations, such as air turbidity and light condition changes, as well as targeted object changes in industrial indoor environments. By collecting data in simulations, multi-modal sensor data and precise ground truth labels are obtainable such as the RGB image, depth image, semantic segmentation, change segmentation, camera poses, and 3D reconstructions. While the previous online SCD datasets evaluate models given well-aligned image pairs, ChangeSim also provides raw unpaired sequences that present an opportunity to develop an online SCD model in an end-to-end manner, considering both pairing and detection. Experiments show that even the latest pair-based SCD models suffer from the bottleneck of the pairing process, and it gets worse when the environment contains the non-targeted variations.
4 PAPERS • 2 BENCHMARKS
OMMO is a new benchmark for several outdoor NeRF-based tasks, such as novel view synthesis, surface reconstruction, and multi-modal NeRF. It contains complex objects and scenes with calibrated images, point clouds and prompt annotations.
4 PAPERS • NO BENCHMARKS YET
We introduced this dataset in Points2Surf, a method that turns point clouds into meshes.
The dataset is split between train, test and val folders.
This dataset contains a variety of common urban road objects scanned with a Velodyne HDL-64E LIDAR, collected in the CBD of Sydney, Australia. There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees.
4 PAPERS • 1 BENCHMARK
🤖 Robo3D - The WOD-C Benchmark WOD-C is an evaluation benchmark heading toward robust and reliable 3D perception in autonomous driving. With it, we probe the robustness of 3D detectors and segmentors under out-of-distribution (OoD) scenarios against corruptions that occur in the real-world environment. Specifically, we consider natural corruptions happen in the following cases:
A new dataset with significant occlusions related to object manipulation.
We established a 3D evaluation benchmark, 3D MM-Vet, severing as assessing the 4-level capacity in embodied interaction scenarios, varying from basic perception to control statements generation.
3 PAPERS • 1 BENCHMARK
Dynamic Human Bodies dataset (DHB), containing 10 point cloud sequences from the MITAMA dataset and 4 from the 8IVFB dataset. The sequences in DHB record 3D human motions with large and non-rigid deformation in real world. The overall dataset contains more than 3000 point cloud frames. And each frame has 1024 points.
Ford Campus Vision and Lidar Data Set is a dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system.
3 PAPERS • NO BENCHMARKS YET
We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi-automated processing, allowing us to produce high-quality 3D annotations without crowd-sourcing. The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories. We focus on hardware and kitchen tool objects to facilitate research in practical scenarios in which a robot manipulator needs to interact with the environment beyond simple pushing or indiscriminate grasping. We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks. We also provide 3D reconstructed meshes of all objects, and we outline s
KAIST-Lane (K-Lane) is the world’s first and the largest public urban road and highway lane dataset for Lidar. K-Lane has more than 15K frames and contains annotations of up to six lanes under various road and traffic conditions, e.g., occluded roads of multiple occlusion levels, roads at day and night times, merging (converging and diverging) and curved lanes.
MVHand is a new multi-view hand posture dataset to obtain complete 3D point clouds of the hand in the real world.
The BiGe corpus is comprised of 54.360 shots of interest extracted from TED and TEDx talks. All shots are tracked with fully 3d landmarks.
2 PAPERS • NO BENCHMARKS YET
Building3D is an urban-scale dataset consisting of more than 160 thousands buildings along with corresponding point clouds, mesh and wireframe models, covering 16 cities in Estonia about 998 Km2. Besides mesh models and real-world LiDAR point clouds, it also includes wireframe models.
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving, as a new simulated large-scale various-scenario multi-view multi-modality autonomous driving dataset, which provides a ground-breaking benchmark platform for interconnected autonomous driving. DOLPHINS outperforms current datasets in six dimensions: temporally-aligned images and point clouds from both vehicles and Road Side Units (RSUs) enabling both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based collaborative perception; 6 typical scenarios with dynamic weather conditions make the most various interconnected auton
The Freiburg Spatial Relations dataset features 546 scenes each containing two out of 25 household objects. The depicted spatial relations can roughly be described as on top, on top on the corner, inside, inside and inclined, next to, and inclined. The dataset contains the 25 object models as textured .obj and .dae files, a low resolution .dae version for visualization in rviz, a scene description file containing the translation and rotation of the objects for each scene, a file with labels for each scene, the 15 splits used for cross validation, and a bash script to convert the models to pointclouds.
Human-M3 is an outdoor multi-modal multi-view multi-person human pose database which includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds.
A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments.
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.
Real 3D-AD is the first point cloud anomaly detection dataset for industrial products. Real3D-AD comprises a total of 1,254 samples that are distributed across 12 distinct categories. These categories include Airplane, Car, Candybar, Chicken, Diamond, Duck, Fish, Gemstone, Seahorse, Shell, Starfish, and Toffees. Each training sample is an absence of blind spots, and a realistic, high-accuracy prototype.
2 PAPERS • 1 BENCHMARK
Dataset of low fidelity resolutions of the RANS equations over airfoils.
1 PAPER • NO BENCHMARKS YET
Dataset consist of both real captures from Photoneo PhoXi structured light scanner devices annotated by hand and synthetic samples produced by custom generator. In comparison with existing datasets for 6D pose estimation, some notable differences include:
1 PAPER • 1 BENCHMARK
Depth vision has been recently used in many locomotion devices with the objective to ease the life of disabled people toward reaching more ecological lifestyle. This is due to the fact that such cameras are cheap, compact and can provide rich information about the environment. Our dataset provides many recordings of point cloud and other types of data during different locomotion modes in urban context. If you used this data, please cite the following papers below: 1-Depth Vision based Terrain Detection Algorithm during Human Locomotion 2-Using Depth Vision for Terrain Detection during Active Locomotion
BigBIRD is a 3D dataset of 125 objects, with the following data for each object:
CLAD (Compled and Long Activities Dataset) is an activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset consists of a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations.
The Cooperative Driving dataset is a synthetic dataset generated using CARLA that contains lidar data from multiple vehicles navigating simultaneously through a diverse set of driving scenarios. This dataset was created to enable further research in multi-agent perception (cooperative perception) including cooperative 3D object detection, cooperative object tracking, multi-agent SLAM and point cloud registration. Towards that goal, all the frames have been labelled with ground-truth sensor pose and 3D object bounding boxes.
A dataset for position-constrained robot grasp planning.
To study the data-scarcity mitigation for learning-based visual localization methods via sim-to-real transfer, we curate and now present the CrossLoc benchmark datasets—a multimodal aerial sim-to-real data available for flights above nature and urban terrains. Unlike the previous computer vision datasets focusing on localization in a single domain (mostly real RGB images), the provided benchmark datasets include various multimodal synthetic cues paired to all real photos. Complementary to the paired real and synthetic data, we offer rich synthetic data that efficiently fills the flight envelope volume in the vicinity of the real data.
DrivAerNet is a large-scale, high-fidelity CFD dataset of 3D industry-standard car shapes designed for data-driven aerodynamic design. It comprises 4000 high-quality 3D car meshes and their corresponding aerodynamic performance coefficients, alongside full 3D flow field information.
EUEN17037 Daylight and View Standard Test Dataset.
The dataset contains synthetic training, validation and test data for occupancy grid mapping from lidar point clouds. Additionally, real-world lidar point clouds from a test vehicle with the same lidar setup as the simulated lidar sensor is provided. Point clouds are stored as PCD files and occupancy grid maps are stored as PNG images whereas one image channel describes evidence for a free and another one describes evidence for occupied cell state.
The data set contains point cloud data captured in an indoor environment with precise localization and ground truth mapping information. Two ”stop-and-go” data sequences of a robot with mounted Ouster OS1-128 lidar are provided. This data-capturing strategy allows recording lidar scans that do not suffer from an error caused by sensor movement. Individual scans from static robot positions are recorded. Additionally, point clouds recorded with the Leica BLK360 scanner are provided as mapping ground-truth data.
Involves data where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data.
A challenging multi-frame interpolation dataset for autonomous driving scenarios. Based on the principle of hard-sample selection and the diversity of scenarios, NL-Drive dataset contains point cloud sequences with large nonlinear movements from three public large-scale autonomous driving datasets: KITTI, Argoverse and Nuscenes. The overall dataset contains more than 20,000 LiDAR point cloud frames. The frame rate of point cloud sequence is 10Hz. And NL-Drive dataset is split into the training, validation and test set in the ratio of 14:3:3. For the point cloud interpolation task, the point cloud frame input is selected at a given interval of frames, and the remaining point clouds as the ground truth of the interpolation frame. Particularly, each sample of NL-Drive dataset is 4 point cloud frames of 2.5Hz when there are 3 interpolation frames to predict between the middle two input frames.
Near-Collision is a large-scale dataset of 13,658 egocentric video snippets of humans navigating in indoor hallways. In order to obtain ground truth annotations of human pose, the videos are provided with the corresponding 3D point cloud from LIDAR.
Robot@Home2, is an enhanced version aimed at improving usability and functionality for developing and testing mobile robotics and computer vision algorithms. Robot@Home2 consists of three main components. Firstly, a relational database that states the contextual information and data links, compatible with Standard Query Language. Secondly,a Python package for managing the database, including downloading, querying, and interfacing functions. Finally, learning resources in the form of Jupyter notebooks, runnable locally or on the Google Colab platform, enabling users to explore the dataset without local installations. These freely available tools are expected to enhance the ease of exploiting the Robot@Home dataset and accelerate research in computer vision and robotics.
Electromagnetic (EM) showers simulated dataset. The data contains 16,577 showers. The data includes information about the tracklets: position coordinates, direction and shower id, and about the showers: shower id, initial particle position and direction, shower energy.
We present the first fine-grained dataset of 1,497 3D VR sketch and 3D shape pairs for 1,005 chair shapes with large shapes diversity from the ShapeNetCore dataset from 50 participants.
The ARPA-E funded TERRA-REF project is generating open-access reference datasets for the study of plant sensing, genomics, and phenomics. Sensor data were generated by a field scanner sensing platform that captures color, thermal, hyperspectral, and active flourescence imagery as well as three dimensional structure and associated environmental measurements. This dataset is provided alongside data collected using traditional field methods in order to support calibration and validation of algorithms used to extract plot level phenotypes from these datasets.
The RBO dataset of articulated objects and interactions is a collection of 358 RGB-D video sequences (67:18 minutes) of humans manipulating 14 articulated objects under varying conditions (light, perspective, background, interaction). All sequences are annotated with ground truth of the poses of the rigid parts and the kinematic state of the articulated object (joint states) obtained with a motion capture system. We also provide complete kinematic models of these objects (kinematic structure and three-dimensional textured shape models). In 78 sequences the contact wrenches during the manipulation are also provided.
The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Additionally, many state-of-the-art deep learning methods are limited to 2D image data, which is insufficient for 3D digital outcrops, such as hyperclouds. To address these challenges, we present Tinto, a multi-sensor benchmark digital outcrop dataset designed to facilitate the development and validation of deep learning approaches for geological mapping, especially for non-structured 3D data like point clouds. Tinto comprises two complementary sets: 1) a real digital outcrop model from Corta Atalaya (Spain), with spectral attributes and ground-truth data, and 2) a synthetic twin that uses latent
Overview: This dataset encompasses a compilation of 6,700 executed scoops (excavations), mapped across a vast spectrum of materials, terrain topography, and compositions.
UAV Laser Scanning data collected over neotropical forest (Paracou French Guiana). Four flights conducted over one ha plot in 2021 and 2022.
A simulated dataset built in Unreal Engine 4 with AirSim. Designed for visual point cloud change detection. Including GT point clouds before changes and after changes. Besides, 4 trajectories with stereo camera and IMU data are recorded for change detection task.
From https://github.com/MMintLab/VIRDO/blob/master/data/dataset_readme.txt,
This dataset provides wireless measurements from two industrial testbeds: iV2V (industrial Vehicle-to-Vehicle) and iV2I+ (industrial Vehicular-to-Infrastructure plus sensor).
A cross-city UDA benchmark built upon nuScenes.