Visual Localization is the problem of estimating the camera pose of a given image relative to a visual representation of a known scene.
In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization.
This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e. g., objects, rooms, buildings), includes static and dynamic entities and their relations (e. g., a person is in a room at a given time).
We present a novel method for local image feature matching.
The most promising approach is inspired by reinforcement learning, namely to replace the deterministic hypothesis selection by a probabilistic selection for which we can derive the expected loss w. r. t.
Popular research areas like autonomous driving and augmented reality have renewed the interest in image-based camera localization.
In contrast, we learn hypothesis search in a principled fashion that lets us optimize an arbitrary task loss during training, leading to large improvements on classic computer vision tasks.
Ranked #1 on Horizon Line Estimation on Horizon Lines in the Wild
Many applications of Visual SLAM, such as augmented reality, virtual reality, robotics or autonomous driving, require versatile, robust and precise solutions, most often with real-time capability.