BoQ: A Place is Worth a Bag of Learnable Queries

In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and generate the queries directly from the input features, BoQ employs distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, our technique provides an interpretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage), BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at https://github.com/amaralibey/Bag-of-Queries.

PDF Abstract CVPR 2024 PDF CVPR 2024 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Place Recognition AmsterTime BoQ (ResNet-50) Recall@1 52.2 # 4
Visual Place Recognition AmsterTime BoQ Recall@1 63.0 # 2
Recall@5 81.6 # 1
Recall@10 85.1 # 1
Visual Place Recognition Eynsham BoQ (ResNet-50) Recall@1 91.3 # 2
Visual Place Recognition Eynsham BoQ Recall@1 92.2 # 1
Recall@5 95.6 # 1
Recall@10 96.4 # 1
Visual Place Recognition Mapillary test BoQ Recall@1 79 # 1
Recall@5 90.3 # 1
Recall@10 92 # 1
Visual Place Recognition Mapillary val BoQ Recall@1 93.8 # 1
Recall@5 96.8 # 2
Recall@10 97 # 3
Visual Place Recognition Mapillary val BoQ (ResNet-50) Recall@1 91.2 # 4
Recall@5 95.3 # 5
Recall@10 96.1 # 5
Visual Place Recognition Nordland BoQ Recall@1 90.6 # 2
Recall@5 96.0 # 3
Recall@10 97.5 # 1
Visual Place Recognition Nordland BoQ (ResNet-50) Recall@1 83.1 # 5
Visual Place Recognition Pittsburgh-250k-test BoQ (ResNet-50) Recall@1 95 # 4
Recall@5 98.5 # 3
Recall@10 99.1 # 2
Visual Place Recognition Pittsburgh-250k-test BoQ Recall@1 96.6 # 1
Recall@5 99.1 # 2
Recall@10 99.5 # 1
Visual Place Recognition Pittsburgh-30k-test BoQ Recall@1 93.7 # 2
Recall@5 97.1 # 4
Recall@10 97.9 # 1
Visual Place Recognition Pittsburgh-30k-test BoQ (ResNet-50) Recall@1 92.4 # 7
Visual Place Recognition San Francisco Landmark Dataset BoQ Recall@1 93.6 # 1
Recall@5 95.8 # 1
Recall@10 96.5 # 1
Visual Place Recognition SPED BoQ (ResNet-50) Recall@1 86.5 # 3
Recall@5 93.4 # 3
Recall@10 95.7 # 3
Visual Place Recognition SPED BoQ Recall@1 92.5 # 1
Recall@5 95.9 # 2
Recall@10 96.7 # 1
Visual Place Recognition St Lucia BoQ Recall@5 100 # 1
Recall@10 100 # 1
Visual Place Recognition St Lucia BoQ (DINOv2) Recall@1 100.0 # 1
Recall@5 100 # 1
Visual Place Recognition SVOX-Night BoQ (ResNet-50) Recall@1 87.1 # 1
Visual Place Recognition SVOX-Overcast BoQ (ResNet-50) Recall@1 97.8 # 1
Visual Place Recognition SVOX-Rain BoQ (ResNet-50) Recall@1 96.2 # 1
Visual Place Recognition SVOX-Snow BoQ (ResNet-50) Recall@1 98.7 # 1
Visual Place Recognition SVOX-Sun BoQ (ResNet-50) Recall@1 95.9 # 1
Visual Place Recognition Tokyo247 BoQ Recall@1 98.1 # 2
Recall@5 98.1 # 2
Recall@10 98.7 # 1

Methods