The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside 90 real building-scale scenes, constructed from 194,400 RGB-D images. Each scene is a residential building consisting of multiple rooms and floor levels, and is annotated with surface construction, camera poses, and semantic segmentation.
378 PAPERS • 5 BENCHMARKS
SUNCG is a large-scale dataset of synthetic 3D scenes with dense volumetric annotations.
181 PAPERS • NO BENCHMARKS YET
SUN3D contains a large-scale RGB-D video database, with 8 annotated sequences. Each frame has a semantic segmentation of the objects in the scene and information about the camera pose. It is composed by 415 sequences captured in 254 different spaces, in 41 different buildings. Moreover, some places have been captured multiple times at different moments of the day.
114 PAPERS • NO BENCHMARKS YET
A dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images.
23 PAPERS • 2 BENCHMARKS
DurLAR is a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery for multi-modal autonomous driving applications. Compared to existing autonomous driving task datasets, DurLAR has the following novel features:
5 PAPERS • NO BENCHMARKS YET
Endoscopic stereo reconstruction for surgical scenes gives rise to specific problems, including the lack of clear corner features, highly specular surface properties, and the presence of blood and smoke. These issues present difficulties for both stereo reconstruction itself and also for standardised dataset production. We present a stereo-endoscopic reconstruction validation dataset based on cone-beam CT (SERV-CT). Two ex vivo small porcine full torso cadavers were placed within the view of the endoscope with both the endoscope and target anatomy visible in the CT scan. Subsequent orientation of the endoscope was manually aligned to match the stereoscopic view and benchmark disparities, depths and occlusions are calculated. The requirement of a CT scan limited the number of stereo pairs to 8 from each ex vivo sample. For the second sample an RGB surface was acquired to aid alignment of smooth, featureless surfaces. Repeated manual alignments showed an RMS disparity accuracy of around
4 PAPERS • NO BENCHMARKS YET
We present a large-scale dataset for 3D urban scene understanding. Compared to existing datasets, our dataset consists of 75 outdoor urban scenes with diverse backgrounds, encompassing over 15,000 images. These scenes offer 360◦ hemispherical views, capturing diverse foreground objects illuminated under various lighting conditions. Additionally, our dataset encompasses scenes that are not limited to forward-driving views, addressing the limitations of previous datasets such as limited overlap and coverage between camera views. The closest pre-existing dataset for generalizable evaluation is DTU [2] (80 scenes) which comprises mostly indoor objects and does not provide multiple foreground objects or background scenes.
3 PAPERS • 1 BENCHMARK
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.
2 PAPERS • NO BENCHMARKS YET
A dataset of 100K synthetic images of skin lesions, ground-truth (GT) segmentations of lesions and healthy skin, GT segmentations of seven body parts (head, torso, hips, legs, feet, arms and hands), and GT binary masks of non-skin regions in the texture maps of 215 scans from the 3DBodyTex.v1 dataset [2], [3] created using the framework described in [1]. The dataset is primarily intended to enable the development of skin lesion analysis methods. Synthetic image creation consisted of two main steps. First, skin lesions from the Fitzpatrick 17k dataset were blended onto skin regions of high-resolution three-dimensional human scans from the 3DBodyTex dataset [2], [3]. Second, two-dimensional renders of the modified scans were generated.
1 PAPER • NO BENCHMARKS YET