The Distribution Family of Similarity Distances

NeurIPS 2007 · Gertjan Burghouts, Arnold Smeulders, Jan-Mark Geusebroek ·

Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that $L_p$-norms --a class of commonly applied distance metrics-- from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images. This fundamental insight opens new directions in the assessment of feature similarity, with projected improvements in object and scene recognition algorithms. Erratum: The authors of paper have declared that they have become convinced that the reasoning in the reference is too simple as a proof of their claims. As a consequence, they withdraw their theorems.

PDF Abstract