🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language

39 dataset results for Stereo

The Middlebury Stereo dataset consists of high-resolution stereo sequences with complex geometry and pixel-accurate ground-truth disparity data. The ground-truth disparities are acquired using a novel technique that employs structured lighting and does not require the calibration of the light projectors.

204 PAPERS • 5 BENCHMARKS

MPI Sintel

MPI (Max Planck Institute) Sintel is a dataset for optical flow evaluation that has 1064 synthesized stereo images and ground truth data for disparity. Sintel is derived from open-source 3D animated short film Sintel. The dataset has 23 different scenes. The stereo images are RGB while the disparity is grayscale. Both have resolution of 1024×436 pixels and 8-bit per channel.

187 PAPERS • 6 BENCHMARKS

ETH3D

ETHD is a multi-view stereo benchmark / 3D reconstruction benchmark that covers a variety of indoor and outdoor scenes. Ground truth geometry has been obtained using a high-precision laser scanner. A DSLR camera as well as a synchronized multi-camera rig with varying field-of-view was used to capture images.

79 PAPERS • 1 BENCHMARK

Middlebury 2014

The Middlebury 2014 dataset contains a set of 23 high resolution stereo pairs for which known camera calibration parameters and ground truth disparity maps obtained with a structured light scanner are available. The images in the Middlebury dataset all show static indoor scenes with varying difficulties including repetitive structures, occlusions, wiry objects as well as untextured areas.

51 PAPERS • 2 BENCHMARKS

MVSEC (Multi Vehicle Stereo Event Camera)

The Multi Vehicle Stereo Event Camera (MVSEC) dataset is a collection of data designed for the development of novel 3D perception algorithms for event based cameras. Stereo event data is collected from car, motorbike, hexacopter and handheld data, and fused with lidar, IMU, motion capture and GPS to provide ground truth pose and depth images.

25 PAPERS • 1 BENCHMARK

3D Ken Burns

This dataset accompanies our paper on synthesizing the 3D Ken Burns effect from a single image. It consists of 134041 captures from 32 virtual environments where each capture consists of 4 views. Each view contains color-, depth-, and normal-maps at a resolution of 512x512 pixels.

13 PAPERS • NO BENCHMARKS YET

WSVD (Web Stereo Video Dataset)

The Web Stereo Video Dataset consists of 553 stereoscopic videos from YouTube. This dataset has a wide variety of scene types, and features many nonrigid objects.

12 PAPERS • NO BENCHMARKS YET

Holopix50k

An in-the-wild stereo image dataset, comprising 49,368 image pairs contributed by users of the Holopix mobile social platform.

9 PAPERS • NO BENCHMARKS YET

Middlebury 2005

Middlebury 2005 is a stereo dataset of indoor scenes.

9 PAPERS • NO BENCHMARKS YET

RECON

RECON (RECON Outdoor Navigation Dataset)

https://sites.google.com/view/recon-robot/dataset

9 PAPERS • NO BENCHMARKS YET

RealEstate10K

RealEstate10K is a large dataset of camera poses corresponding to 10 million frames derived from about 80,000 video clips, gathered from about 10,000 YouTube videos. For each clip, the poses form a trajectory where each pose specifies the camera position and orientation along the trajectory. These poses are derived by running SLAM and bundle adjustment algorithms on a large set of videos.

9 PAPERS • 2 BENCHMARKS

DurLAR (A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery)

DurLAR is a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery for multi-modal autonomous driving applications. Compared to existing autonomous driving task datasets, DurLAR has the following novel features:

5 PAPERS • NO BENCHMARKS YET

Middlebury 2006

The Middlebury 2006 is a stereo dataset of indoor scenes with multiple handcrafted layouts.

5 PAPERS • NO BENCHMARKS YET

PedX

PedX is a large-scale multi-modal collection of pedestrians at complex urban intersections. The dataset provides high-resolution stereo images and LiDAR data with manual 2D and automatic 3D annotations. The data was captured using two pairs of stereo cameras and four Velodyne LiDAR sensors.

5 PAPERS • NO BENCHMARKS YET

Middlebury 2001

The Middlebury 2001 is a stereo dataset of indoor scenes with multiple handcrafted layouts.

4 PAPERS • NO BENCHMARKS YET

SERV-CT (SERV-CT: A disparity dataset from CT for validation of endoscopic 3D reconstruction)

Endoscopic stereo reconstruction for surgical scenes gives rise to specific problems, including the lack of clear corner features, highly specular surface properties, and the presence of blood and smoke. These issues present difficulties for both stereo reconstruction itself and also for standardised dataset production. We present a stereo-endoscopic reconstruction validation dataset based on cone-beam CT (SERV-CT). Two ex vivo small porcine full torso cadavers were placed within the view of the endoscope with both the endoscope and target anatomy visible in the CT scan. Subsequent orientation of the endoscope was manually aligned to match the stereoscopic view and benchmark disparities, depths and occlusions are calculated. The requirement of a CT scan limited the number of stereo pairs to 8 from each ex vivo sample. For the second sample an RGB surface was acquired to aid alignment of smooth, featureless surfaces. Repeated manual alignments showed an RMS disparity accuracy of around

4 PAPERS • NO BENCHMARKS YET

CVGL Camera Calibration Dataset

The dataset has been generated using Town 1 and Town 2 of CARLA Simulator. The dataset consists of 50 camera configurations with each town having 25 configurations. The parameters modified for generating the configurations include f ov, x, y, z, pitch, yaw, and roll. Here, f ov is the field of view, (x, y, z) is the translation while (pitch, yaw, and roll) is the rotation between the cameras. The total number of image pairs is 1,23,017, out of which 58,596 belong to Town 1 while 64,421 belong to Town 2, the difference in the number of images is due to the length of the tracks.

3 PAPERS • NO BENCHMARKS YET

UASOL (A large-scale high-resolution outdoor stereo dataset)

The UASOL an RGB-D stereo dataset, that contains 160902 frames, filmed at 33 different scenes, each with between 2 k and 10 k frames. The frames show different paths from the perspective of a pedestrian, including sidewalks, trails, roads, etc. The images were extracted from video files with 15 fps at HD2K resolution with a size of 2280 × 1282 pixels. The dataset also provides a GPS geolocalization tag for each second of the sequences and reflects different climatological conditions. It also involved up to 4 different persons filming the dataset at different moments of the day.

3 PAPERS • 1 BENCHMARK

VA (Virtual Apartment)

A synthetic depth estimation dataset for benchmark rendered from a high-quality CAD indoor environment

3 PAPERS • 1 BENCHMARK

IBISCape

A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments.

2 PAPERS • NO BENCHMARKS YET

Middlebury 2003

Middlebury 2003 is a stereo dataset for indoor scenes.

2 PAPERS • NO BENCHMARKS YET

Middlebury MVS

Middlebury MVS is the earliest MVS dataset for multi-view stereo network evaluation. It contains two indoor objects with low-resolution (640 × 480) images and calibrated cameras.

2 PAPERS • NO BENCHMARKS YET

3D-POP

The dataset is designed specifically to solve a range of computer vision problems (2D-3D tracking, posture) faced by biologists while designing behavior studies with animals.

1 PAPER • NO BENCHMARKS YET

BGG dataset (PUBG Gun Sound Dataset)

We recorded gun sounds by changing the type and position of guns to diversify distances and angles in the PUBG environment. The BGG dataset consists of 2,195 samples with 37 different types of guns and five directions, including a silence in which there is no gunfire, but noises exist. The distance from the firearms ranged from 0 meters to 600 meters. The audio was recorded in stereo (i.e., two-channel audio), and each sample contains various environmental noises (e.g., water splashing, walking, and bullet friction).

1 PAPER • NO BENCHMARKS YET

Bus Trajectory Dataset

This dataset contains the bus trajectory dataset collected by 6 volunteers who were asked to travel across the sub-urban city of Durgapur, India, on intra-city buses (route name: 54 Feet). During the travel, the volunteers captured sensor logs through an Android application installed on COTS smartphones.

1 PAPER • NO BENCHMARKS YET

CE4

Given the difficulty to handle planetary data we provide downloadable files in PNG format from the missions Chang'E-3 and Chang'E-4. In addition to a set of scripts to do the conversion given a different PDS4 Dataset.

1 PAPER • NO BENCHMARKS YET

ConsInv Dataset

ConsInv is a stereo RGB + IMU dataset designed for Dynamic SLAM testing and contains two subsets:

1 PAPER • NO BENCHMARKS YET

INSANE Cross-Domain UAV Data Set (Cross-Domain UAV Data Sets with Increased Number of Sensors for developing Advanced and Novel Estimators)

This data set contains over 600GB of multimodal data from a Mars analog mission, including accurate 6DoF outdoor ground truth, indoor-outdoor transitions with continuous cross-domain ground truth, and indoor data with Optitrack measurements as ground truth. With 26 flights and a combined distance of 2.5km, this data set provides you with various distinct challenges for testing and proofing your algorithms. The UAV carries 18 sensors, including a high-resolution navigation camera and a stereo camera with an overlapping field of view, two RTK GNSS sensors with centimeter accuracy, as well as three IMUs, placed at strategic locations: Hardware dampened at the center, off-center with a lever arm, and a 1kHz IMU rigidly attached to the UAV (in case you want to work with unfiltered data). The sensors are fully pre-calibrated, and the data set is ready to use. However, if you want to use your own calibration algorithms, then the raw calibration data is also ready for download. The cross-domai

1 PAPER • NO BENCHMARKS YET

InfraParis

InfraParis is a novel and versatile dataset supporting multiple tasks across three modalities: RGB, depth, and infrared. From the city to the suburbs, it contains a variety of styles in different areas of the greater Paris area, providing rich semantic information. InfraParis contains 7301 images with bounding boxes and full semantic (19 classes) annotations. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.

1 PAPER • NO BENCHMARKS YET

L1BSR

L1BSR (L1BSR dataset)

The Sentinel-2 satellite carries 12 CMOS detectors for the VNIR bands, with adjacent detectors having overlapping fields of view that result in overlapping regions in level-1 B (L1B) images. This dataset includes 3740 pairs of overlapping image crops extracted from two L1B products. Each crop has a height of around 400 pixels and a variable width that depends on the overlap width between detectors for RGBN bands, typically around 120-200 pixels. In addition to detector parallax, there is also cross-band parallax for each detector, resulting in shifts between bands. Pre-registration is performed for both cross-band and cross-detector parallax, with a precision of up to a few pixels (typically less than 10 pixels).

1 PAPER • NO BENCHMARKS YET

Lindenthal Camera Traps

This data set contains 775 video sequences, captured in the wildlife park Lindenthal (Cologne, Germany) as part of the AMMOD project, using an Intel RealSense D435 stereo camera. In addition to color and infrared images, the D435 is able to infer the distance (or “depth”) to objects in the scene using stereo vision. Observed animals include various birds (at daytime) and mammals such as deer, goats, sheep, donkeys, and foxes (primarily at nighttime). A subset of 412 images is annotated with a total of 1038 individual animal annotations, including instance masks, bounding boxes, class labels, and corresponding track IDs to identify the same individual over the entire video.

1 PAPER • NO BENCHMARKS YET

Mars Sample Localization

It contains grayscale mono and stereo images (NavCam and LocCam) from laboratory tests performed by a prototype rover on a martian-like testbed. The dataset can be used for artificial sample-tube detection and pose estimation. It also contains synthetic color images of the sample tube on a martian scenario created with Unreal Engine.

1 PAPER • NO BENCHMARKS YET

Real SVBRDF

A total of 80 real material samples were captured in a dark room. For each material, multiple captures were collected at different distances from the camera (between 250 and 650 mm) to observe both macro- and micro-level details. The dataset is mostly comprised of planar specimens but also includes non-planar objects such as mugs, globes, crumpled paper, etc. As shown above, it contains a rich diversity of materials, including diffuse or specular wrapping papers, fabrics, anisotropic metals, plastics, rugs, ceramic and wood flooring samples, etc. Each capture set includes 12 LDR (8 bpp) RGB-D images at 4K pixel resolution. Each set is captured at 50% and 100% of maximum light intensity. In total, we captured 462 such image sets (combinations of light intensities, distances to the camera, and material sample).

1 PAPER • NO BENCHMARKS YET

Real-World Stereo Color and Sharpness Mismatch Dataset

A real-world stereo video dataset, containing 1200 frame pairs with real-world color and sharpness mismatches caused by beam splitter.

1 PAPER • NO BENCHMARKS YET

TERRA-REF (TERRA-REF, An open reference data set from high resolution genomics, phenomics, and imaging sensors)

The ARPA-E funded TERRA-REF project is generating open-access reference datasets for the study of plant sensing, genomics, and phenomics. Sensor data were generated by a field scanner sensing platform that captures color, thermal, hyperspectral, and active flourescence imagery as well as three dimensional structure and associated environmental measurements. This dataset is provided alongside data collected using traditional field methods in order to support calibration and validation of algorithms used to extract plot level phenotypes from these datasets.

1 PAPER • NO BENCHMARKS YET

THEOStereo

THEOStereo is a dataset providing synthetic stereo image pairs and their corresponding scene depth and will be published along with 1. All images follow the omnidirectional camera model. In total, there are 31,250 omnidirectional images pairs. The training set contains 25,000 image pairs. For validation and testing there are 3,125 image pairs, respectively. For each pair, there is a ground truth depth map describing the pixel-wise distance of the object along the left camera's z-axis. The virtual omnidirectional cameras exhibit a FOV of 180 degrees and can be described using Kannala's camera model 2. The distortion parameters are k_1 = 1 and k_2 = k_3 = k_4 = k_5 = 0. The length of the stereo camera's baseline was 0.3 AU (approx. 15 cm, not 30 cm!). Please do not forget to cite 1 if you use the dataset in your work. Thank you.

1 PAPER • NO BENCHMARKS YET

rc_49 (rc_49 Grasping Dataset)

Includes several sets of synthetic stereo images labelled with grasp rectangles representing parallel-jaw grasps (Cornell-like format).

1 PAPER • NO BENCHMARKS YET

KAIST multi-spectral Day/Night 2018

We introduce the KAIST multi-spectral dataset, which covers a greater range of drivable regions, from urban to residential, for autonomous systems. Our dataset provides different perspectives of the world captured in coarse time slots (day and night) in addition to fine time slots (sunrise, morning, afternoon, sunset, night and dawn). For all-day perception of autonomous systems, we propose the use of a different spectral sensor, i.e., a thermal imaging camera. Toward this goal, we develop a multi-sensor platform, which supports the use of a co-aligned RGB/Thermal camera, RGB stereo, 3D LiDAR and inertial sensors (GPS/IMU) and a related calibration technique. We design a wide range of visual perception tasks including the object detection, drivable region detection, localization, image enhancement, depth estimation and colorization using a single/multi-spectral approach. In this paper, we provide a description of our benchmark with the recording platform, data format, development toolk

0 PAPER • NO BENCHMARKS YET

Multi-Spectral Stereo Dataset (RGB, NIR, thermal images, LiDAR, GPS/IMU)

Abstract: We introduce the multi-spectral stereo (MS2) outdoor dataset, including stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GPS/IMU information. Our dataset provides rectified and synchronized 184K data pairs taken from city, residential, road, campus, and suburban areas in the morning, daytime, and nighttime under clear-sky, cloudy, and rainy conditions. We designed the dataset to explore various computer vision algorithms from multi-spectral sensor data to achieve high-level performance, reliability, and robustness against challenging environments.

0 PAPER • NO BENCHMARKS YET

Datasets

39 dataset results for Stereo