Rethinking Visual Geo-localization for Large-Scale Applications

CVPR 2022  ยท  Gabriele Berton, Carlo Masone, Barbara Caputo ยท

Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to the previous state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time, and it achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at https://github.com/gmberton/CosPlace.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Visual Place Recognition Gardens Point CosPlace Recall@1 74.00 # 4
Visual Place Recognition Hawkins CosPlace Recall@1 31.36 # 5
Visual Place Recognition Mapillary val CosPlace (ResNet-101 2048-D) Recall@1 86.7 # 2
Recall@5 92.1 # 2
Recall@10 93.4 # 2
Visual Place Recognition Mapillary val CosPlace Recall@5 89.9 # 6
Recall@10 91.8 # 5
Visual Place Recognition MSLS CosPlace Recall@1 79.6 # 1
Visual Place Recognition Nardo-Air R CosPlace Recall@1 91.55 # 2
Visual Place Recognition Pittsburgh-250k-test CosPlace Recall@1 91.5 # 3
Recall@5 96.9 # 4
Recall@10 97.9 # 4
Visual Place Recognition Pittsburgh-30k-test CosPlace (ResNet-101 2048-D) Recall@1 90.4 # 3
Recall@5 95.7 # 2
Recall@10 96.7 # 1
Visual Place Recognition Pittsburgh-30k-test CosPlace Recall@1 90.45 # 2
Visual Place Recognition SF-XL test v1 CosPlace Recall@1 64.7 # 1
Recall@5 73.3 # 1
Recall@10 76.6 # 1
Visual Place Recognition SF-XL test v2 CosPlace Recall@1 83.4 # 1
Recall@5 91.6 # 1
Recall@10 94.1 # 1
Visual Place Recognition St Lucia CosPlace Recall@1 99.59 # 2
Recall@5 99.9 # 1
Recall@10 99.9 # 1
Visual Place Recognition Tokyo247 CosPlace Recall@1 82.2 # 2
Visual Place Recognition Tokyo247 CosPlace (ResNet-101 2048-D) Recall@5 95.9 # 1
Recall@10 96.5 # 1
Visual Place Recognition VP-Air CosPlace Recall@1 8.12 # 6

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Visual Place Recognition 17 Places CosPlace Recall@1 61.08 # 6
Visual Place Recognition Baidu Mall CosPlace Recall@1 41.62 # 7
Visual Place Recognition Laurel Caverns CosPlace Recall@1 24.11 # 7
Visual Place Recognition Mid-Atlantic Ridge CosPlace Recall@1 20.79 # 7
Visual Place Recognition Nardo-Air CosPlace Recall@1 0 # 7
Visual Place Recognition Oxford RobotCar Dataset CosPlace Recall@1 91.10 # 2

Methods


No methods listed for this paper. Add relevant methods here