Visual Localization
151 papers with code • 5 benchmarks • 20 datasets
Visual Localization is the problem of estimating the camera pose of a given image relative to a visual representation of a known scene.
Libraries
Use these libraries to find Visual Localization models and implementationsDatasets
Latest papers
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio?
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.
OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes
In this work, we seek to predict camera poses across scenes with a multi-task learning manner, where we view the localization of each scene as a new task.
D2S: Representing local descriptors and global scene coordinates for camera relocalization
In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent local descriptors and their scene coordinates.
ResMatch: Residual Attention Learning for Local Feature Matching
In order to facilitate the learning of matching and filtering, we inject the similarity of descriptors and relative positions into cross- and self-attention score, respectively.
LightGlue: Local Feature Matching at Light Speed
We introduce LightGlue, a deep neural network that learns to match local features across images.
Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization
In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features.
Illumination-insensitive Binary Descriptor for Visual Measurement Based on Local Inter-patch Invariance
Existing binary descriptors may not perform well for long-term visual measurement tasks due to their sensitivity to illumination variations.
Eiffel Tower: A Deep-Sea Underwater Dataset for Long-Term Visual Localization
This paper presents a new deep-sea dataset to benchmark underwater long-term visual localization.
Privacy-Preserving Representations are not Enough -- Recovering Scene Content from Camera Poses
In this paper, we show that an attacker can learn about details of a scene without any access by simply querying a localization service.