The dataset contains two subsets of synthetic, semantically segmented road-scene images, which have been created for developing and applying the methodology described in the paper "A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View" (IEEE Xplore, arXiv, YouTube)
2 PAPERS • 2 BENCHMARKS
The CVACT dataset is a matching task between street- and aerial views, from Canberra (Australia). This task helps to determine localization without GPS coordinates for the street-view images. Google Street View panoramas are used as ground images, and matching aerial images also from the Google Maps API. The dataset comprises 35,532 image pairs for training and 8,884 image pairs for evaluation, and recall is the primary metric for evaluation. To further test the generalization in comparison to the CVUSA dataset, CVACT features 92,802 test images.
8 PAPERS • 2 BENCHMARKS
The CVUSA dataset is a matching task between street- and aerial views, from different regions of the US. This task helps to determine localization without GPS coordinates for the street-view images. Google Street View panoramas are used as ground images, and matching aerial images at zoom level 19 are obtained from Microsoft Bing Maps. The dataset comprises 35,532 image pairs for training and 8,884 image pairs for testing, and recall is the primary metric for evaluation.
14 PAPERS • 2 BENCHMARKS
The Dayton dataset is a dataset for ground-to-aerial (or aerial-to-ground) image translation, or cross-view image synthesis. It contains images of road views and aerial views of roads. There are 76,048 images in total and the train/test split is 55,000/21,048. The images in the original dataset have 354×354 resolution.
13 PAPERS • 4 BENCHMARKS