Fast and Robust Comparison of Probability Measures in Heterogeneous Spaces

5 Feb 2020  ·  Ryoma Sato, Marco Cuturi, Makoto Yamada, Hisashi Kashima ·

Comparing two probability measures supported on heterogeneous spaces is an increasingly important problem in machine learning. Such problems arise when comparing for instance two populations of biological cells, each described with its own set of features, or when looking at families of word embeddings trained across different corpora/languages. For such settings, the Gromov Wasserstein (GW) distance is often presented as the gold standard. GW is intuitive, as it quantifies whether one measure can be isomorphically mapped to the other. However, its exact computation is intractable, and most algorithms that claim to approximate it remain expensive. Building on \cite{memoli-2011}, who proposed to represent each point in each distribution as the 1D distribution of its distances to all other points, we introduce in this paper the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations. Our main contribution is to propose a sweep line algorithm to compute AE \emph{exactly} in log-quadratic time, where a naive implementation would be cubic. This is quasi-linear w.r.t. the description of the problem itself. Our second contribution is the proposal of robust variants of AE and AW that uses rank statistics rather than the original distances. We show that AE and AW perform well in various experimental settings at a fraction of the computational cost of popular GW approximations. Code is available at \url{https://github.com/joisino/anchor-energy}.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods