R2D2: Reliable and Repeatable Detector and Descriptor

Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical approaches are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught up with these techniques, focusing on learning repeatable saliency maps for keypoint detection or learning descriptors at the detected keypoint locations. In this work, we argue that repeatable regions are not necessarily discriminative and can therefore lead to select suboptimal keypoints. Furthermore, we claim that descriptors should be learned only in regions for which matching can be performed with high confidence. We thus propose to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. This allows to avoid ambiguous areas, thus leading to reliable keypoint detection and description. Our detection-and-description approach simultaneously outputs sparse, repeatable and reliable keypoints that outperforms state-of-the-art detectors and descriptors on the HPatches dataset and on the recent Aachen Day-Night localization benchmark.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Camera Localization Aachen Day-Night benchmark R2D2 WASF N8 (full scale, 10K kpts) Acc @ 0.5m, 2° 45.9 # 2
Acc @ 1m, 5° 66.3 # 2
Acc @ 5m, 10° 88.8 # 2
Image Matching IMC PhotoTourism R2D2 mean average accuracy @ 10 0.56345 # 7


No methods listed for this paper. Add relevant methods here