We sample 2025 frames of images from the original KITTI for Mono3DRefer, containing 41,140 expressions in total and a vocabulary of 5,271 words.
1 PAPER • 1 BENCHMARK
Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.
2 PAPERS • NO BENCHMARKS YET
aiMotive dataset is a multimodal dataset for robust autonomous driving with long-range perception. The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view. The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain and is annotated with 3D bounding boxes with consistent identifiers across frames.
4D-OR includes a total of 6734 scenes, recorded by six calibrated RGB-D Kinect sensors 1 mounted to the ceiling of the OR, with one frame-per-second, providing synchronized RGB and depth images. We provide fused point cloud sequences of entire scenes, automatically annotated human 6D poses and 3D bounding boxes for OR objects. Furthermore, we provide SSG annotations for each step of the surgery together with the clinical roles of all the humans in the scenes, e.g., nurse, head surgeon, anesthesiologist.
8 PAPERS • 1 BENCHMARK
A large-scale V2X perception dataset using CARLA and OpenCDA
24 PAPERS • 1 BENCHMARK
The nuScenes dataset is a large-scale autonomous driving dataset. The dataset has 3D bounding boxes for 1000 scenes collected in Boston and Singapore. Each scene is 20 seconds long and annotated at 2Hz. This results in a total of 28130 samples for training, 6019 samples for validation and 6008 samples for testing. The dataset has the full autonomous vehicle data suite: 32-beam LiDAR, 6 cameras and radars with complete 360° coverage. The 3D object detection challenge evaluates the performance on 10 classes: cars, trucks, buses, trailers, construction vehicles, pedestrians, motorcycles, bicycles, traffic cones and barriers.
1,608 PAPERS • 20 BENCHMARKS
The Stanford 3D Indoor Scene Dataset (S3DIS) dataset contains 6 large-scale indoor areas with 271 rooms. Each point in the scene point cloud is annotated with one of the 13 semantic categories.
422 PAPERS • 10 BENCHMARKS
H3D (Humans in 3D) is a dataset of annotated people. The annotations include:
0 PAPER • NO BENCHMARKS YET
Falling Things (FAT) is a dataset for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. It consists of generated photorealistic images with accurate 3D pose annotations for all objects in 60k images.
6 PAPERS • NO BENCHMARKS YET
The H3D is a large scale full-surround 3D multi-object detection and tracking dataset. It is gathered from HDD dataset, a large scale naturalistic driving dataset collected in San Francisco Bay Area. H3D consists of following features:
34 PAPERS • NO BENCHMARKS YET
The Waymo Open Dataset is comprised of high resolution sensor data collected by autonomous vehicles operated by the Waymo Driver in a wide variety of conditions.
383 PAPERS • 12 BENCHMARKS