Clustering in Hilbert simplex geometry

3 Apr 2017  ·  Frank Nielsen, Ke Sun ·

Clustering categorical distributions in the finite-dimensional probability simplex is a fundamental task met in many applications dealing with normalized histograms. Traditionally, the differential-geometric structures of the probability simplex have been used either by (i) setting the Riemannian metric tensor to the Fisher information matrix of the categorical distributions, or (ii) defining the dualistic information-geometric structure induced by a smooth dissimilarity measure, the Kullback-Leibler divergence. In this work, we introduce for clustering tasks a novel computationally-friendly framework for modeling geometrically the probability simplex: The {\em Hilbert simplex geometry}. In the Hilbert simplex geometry, the distance is the non-separable Hilbert's metric distance which satisfies the property of information monotonicity with distance level set functions described by polytope boundaries. We show that both the Aitchison and Hilbert simplex distances are norm distances on normalized logarithmic representations with respect to the $\ell_2$ and variation norms, respectively. We discuss the pros and cons of those different statistical modelings, and benchmark experimentally these different kind of geometries for center-based $k$-means and $k$-center clustering. Furthermore, since a canonical Hilbert distance can be defined on any bounded convex subset of the Euclidean space, we also consider Hilbert's geometry of the elliptope of correlation matrices and study its clustering performances compared to Fr\"obenius and log-det divergences.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here