Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations -- usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Metric Learning CARS196 Hyp-ViT R@1 86.5 # 23
Metric Learning CARS196 Hyp-DINO 8x8 R@1 92.8 # 2
Metric Learning CARS196 Hyp-DINO R@1 89.2 # 13
Metric Learning CUB-200-2011 Hyp-ViT R@1 85.6 # 3
Metric Learning CUB-200-2011 Hyp-DINO R@1 80.9 # 1
Metric Learning In-Shop Hyp-ViT R@1 92.5 # 4
Metric Learning In-Shop Hyp-DINO R@1 92.4 # 5
Metric Learning Stanford Online Products Hyp-DINO R@1 85.1 # 7
Metric Learning Stanford Online Products Hyp-ViT R@1 85.9 # 6

Methods