🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

70 dataset results for Depth Estimation

SUN3D contains a large-scale RGB-D video database, with 8 annotated sequences. Each frame has a semantic segmentation of the objects in the scene and information about the camera pose. It is composed by 415 sequences captured in 254 different spaces, in 41 different buildings. Moreover, some places have been captured multiple times at different moments of the day.

114 PAPERS • NO BENCHMARKS YET

KITTI

KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome

3,233 PAPERS • 141 BENCHMARKS

NYUv2 (NYU-Depth V2)

The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect. It features:

844 PAPERS • 20 BENCHMARKS

TUM RGB-D

TUM RGB-D is an RGB-D dataset. It contains the color and depth images of a Microsoft Kinect sensor along the ground-truth trajectory of the sensor. The data was recorded at full frame rate (30 Hz) and sensor resolution (640x480). The ground-truth trajectory was obtained from a high-accuracy motion-capture system with eight high-speed tracking cameras (100 Hz).

190 PAPERS • NO BENCHMARKS YET

Make3D

The Make3D dataset is a monocular Depth Estimation dataset that contains 400 single training RGB and depth map pairs, and 134 test samples. The RGB images have high resolution, while the depth maps are provided at low resolution.

122 PAPERS • 1 BENCHMARK

Middlebury 2006

The Middlebury 2006 is a stereo dataset of indoor scenes with multiple handcrafted layouts.

5 PAPERS • NO BENCHMARKS YET

Middlebury (Middlebury Stereo)

The Middlebury Stereo dataset consists of high-resolution stereo sequences with complex geometry and pixel-accurate ground-truth disparity data. The ground-truth disparities are acquired using a novel technique that employs structured lighting and does not require the calibration of the light projectors.

205 PAPERS • 5 BENCHMARKS

3D60

Collects high quality 360 datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360 via rendering.

14 PAPERS • NO BENCHMARKS YET

4D Light Field Dataset

4D Light Field Dataset is a light field benchmark consisting of 24 carefully designed synthetic, densely sampled 4D light fields with highly accurate disparity ground truth.

2 PAPERS • NO BENCHMARKS YET

DCM

The DCM dataset is composed of 772 annotated images from 27 golden age comic books. We freely collected them from the free public domain collection of digitized comic books Digital Comics Museum. One album per available publisher was selected to get as many different styles as possible. We made ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like.

4 PAPERS • 3 BENCHMARKS

DDAD (Dense Depth for Autonomous Driving)

DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting. DDAD contains scenes from urban settings in the United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba).

55 PAPERS • 1 BENCHMARK

DrivingStereo

DrivingStereo contains over 180k images covering a diverse set of driving scenarios, which is hundreds of times larger than the KITTI Stereo dataset. High-quality labels of disparity are produced by a model-guided filtering strategy from multi-frame LiDAR points.

41 PAPERS • NO BENCHMARKS YET

ETH3D

ETHD is a multi-view stereo benchmark / 3D reconstruction benchmark that covers a variety of indoor and outdoor scenes. Ground truth geometry has been obtained using a high-precision laser scanner. A DSLR camera as well as a synchronized multi-camera rig with varying field-of-view was used to capture images.

80 PAPERS • 1 BENCHMARK

EndoSLAM (Endoscopic SLAM dataset)

The endoscopic SLAM dataset (EndoSLAM) is a dataset for depth estimation approach for endoscopic videos. It consists of both ex-vivo and synthetically generated data. The ex-vivo part of the dataset includes standard as well as capsule endoscopy recordings. The dataset is divided into 35 sub-datasets. Specifically, 18, 5 and 12 sub-datasets exist for colon, small intestine and stomach respectively.

3 PAPERS • NO BENCHMARKS YET

Holopix50k

An in-the-wild stereo image dataset, comprising 49,368 image pairs contributed by users of the Holopix mobile social platform.

9 PAPERS • NO BENCHMARKS YET

Hypersim

For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. Hypersim is a photorealistic synthetic dataset for holistic indoor scene understanding. It contains 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.

61 PAPERS • 1 BENCHMARK

MegaDepth

The MegaDepth dataset is a dataset for single-view depth prediction that includes 196 different locations reconstructed from COLMAP SfM/MVS.

115 PAPERS • NO BENCHMARKS YET

OASIS

OASIS (Open Annotations of Single Image Surfaces)

A dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images.

23 PAPERS • 2 BENCHMARKS

SeasonDepth

Aa new cross-season scaleless monocular depth prediction dataset from CMU Visual Localization dataset through structure from motion.

3 PAPERS • NO BENCHMARKS YET

Stanford Light Field

The Stanford Light Field Archive is a collection of several light fields for research in computer graphics and vision.

7 PAPERS • NO BENCHMARKS YET

Taskonomy

Taskonomy provides a large and high-quality dataset of varied indoor scenes.

135 PAPERS • 2 BENCHMARKS

WSVD (Web Stereo Video Dataset)

The Web Stereo Video Dataset consists of 553 stereoscopic videos from YouTube. This dataset has a wide variety of scene types, and features many nonrigid objects.

12 PAPERS • NO BENCHMARKS YET

Datasets

70 dataset results for Depth Estimation