AnyLoc: Towards Universal Visual Place Recognition

Visual Place Recognition (VPR) is vital for robot localization. To date, the most performant VPR approaches are environment- and task-specific: while they exhibit strong performance in structured environments (predominantly urban driving), their performance degrades severely in unstructured environments, rendering most approaches brittle to robust real-world deployment. In this work, we develop a universal solution to VPR -- a technique that works across a broad range of structured and unstructured environments (urban, outdoors, indoors, aerial, underwater, and subterranean environments) without any re-training or fine-tuning. We demonstrate that general-purpose feature representations derived from off-the-shelf self-supervised models with no VPR-specific training are the right substrate upon which to build such a universal VPR solution. Combining these derived features with unsupervised feature aggregation enables our suite of methods, AnyLoc, to achieve up to 4X significantly higher performance than existing approaches. We further obtain a 6% improvement in performance by characterizing the semantic properties of these features, uncovering unique domains which encapsulate datasets from similar environments. Our detailed experiments and analysis lay a foundation for building VPR solutions that may be deployed anywhere, anytime, and across anyview. We encourage the readers to explore our project page and interactive demos: https://anyloc.github.io/.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Place Recognition 17 Places CLIP Recall@1 59.36 # 7
Visual Place Recognition 17 Places AnyLoc-VLAD-DINOv2 Recall@1 65.02 # 1
Visual Place Recognition Baidu Mall AnyLoc-VLAD-DINOv2 Recall@1 75.22 # 1
Visual Place Recognition Baidu Mall CLIP Recall@1 56.02 # 3
Visual Place Recognition Gardens Point AnyLoc-VLAD-DINOv2 Recall@1 95.5 # 1
Visual Place Recognition Gardens Point CLIP Recall@1 42.5 # 7
Visual Place Recognition Hawkins AnyLoc-VLAD-DINOv2 Recall@1 65.25 # 1
Visual Place Recognition Hawkins CLIP Recall@1 33.05 # 4
Visual Place Recognition Laurel Caverns AnyLoc-VLAD-DINOv2 Recall@1 61.61 # 1
Visual Place Recognition Laurel Caverns CLIP Recall@1 36.61 # 5
Visual Place Recognition Mid-Atlantic Ridge CLIP Recall@1 25.74 # 3
Visual Place Recognition Mid-Atlantic Ridge AnyLoc-VLAD-DINOv2 Recall@1 34.65 # 1
Visual Place Recognition Nardo-Air AnyLoc-VLAD-DINOv2 Recall@1 76.06 # 1
Visual Place Recognition Nardo-Air CLIP Recall@1 42.25 # 4
Visual Place Recognition Nardo-Air R AnyLoc-VLAD-DINOv2 Recall@1 85.92 # 3
Visual Place Recognition Nardo-Air R CLIP Recall@1 61.97 # 7
Visual Place Recognition Nardo-Air R AnyLoc-VLAD-DINO Recall@1 94.37 # 1
Visual Place Recognition Oxford RobotCar Dataset CLIP Recall@1 34.55 # 6
Visual Place Recognition Oxford RobotCar Dataset AnyLoc-VLAD-DINOv2 Recall@1 98.95 # 1
Visual Place Recognition Pittsburgh-30k-test AnyLoc-VLAD-DINOv2 Recall@1 87.66 # 7
Visual Place Recognition Pittsburgh-30k-test CLIP Recall@1 54.97 # 12
Visual Place Recognition St Lucia CLIP Recall@1 62.7 # 6
Visual Place Recognition St Lucia AnyLoc-VLAD-DINOv2 Recall@1 96.17 # 4
Visual Place Recognition VP-Air CLIP Recall@1 36.59 # 3
Visual Place Recognition VP-Air AnyLoc-VLAD-DINOv2 Recall@1 66.74 # 1

Methods


No methods listed for this paper. Add relevant methods here