# Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure onto another, on those that are the ``thriftiest'', i.e. such that the averaged cost $c(x, T(x))$ between $x$ and its image $T(x)$ be as small as possible. Many computational approaches have been proposed to estimate such Monge maps when $c$ is the $\ell_2^2$ distance, e.g., using entropic maps [Pooladian'22], or neural networks [Makkuva'20, Korotin'20]. We propose a new model for transport maps, built on a family of translation invariant costs $c(x, y):=h(x-y)$, where $h:=\tfrac{1}{2}\|\cdot\|_2^2+\tau$ and $\tau$ is a regularizer. We propose a generalization of the entropic map suitable for $h$, and highlight a surprising link tying it with the Bregman centroids of the divergence $D_h$ generated by $h$, and the proximal operator of $\tau$. We show that choosing a sparsity-inducing norm for $\tau$ results in maps that apply Occam's razor to transport, in the sense that the displacement vectors $\Delta(x):= T(x)-x$ they induce are sparse, with a sparsity pattern that varies depending on $x$. We showcase the ability of our method to estimate meaningful OT maps for high-dimensional single-cell transcription data, in the $34000$-$d$ space of gene counts for cells, without using dimensionality reduction, thus retaining the ability to interpret all displacements at the gene level.

PDF Abstract