Adaptive Tree Wasserstein Minimization for Hierarchical Generative Modeling

1 Jan 2021 · ZiHao Wang, Xu Zhao, Tam Le, Hao Wu, Yong Zhang, Makoto Yamada ·

Optimal Transport(OT) is a machine learning problem with applications including distribution comparison, generative adversarial networks, unsupervised domain adaptation, and to name a few. For deep learning literatures, the Wasserstein distance, which can be estimated by using linear programming and/or the Sinkhorn algorithm, is widely used to compare two distributions. However, those optimization algorithms require complexity over $O(n^2)$, where $n$ is the number of samples for data sets. Thus, it is in general hard to use the traditional algorithms for large-scale problems. Recently, researchers try to speed up the Wasserstein computation by using a closed-form solution, which has lower complexity than $O(n^2)$. For example, sliced Wasserstein solves OT in $O(n\log n)$ time in one-dimensional space by using sorting samples but it suffers from the information loss when projected into one-dimensional space. In this work, we consider OT over tree metrics, which is more general than the sliced Wasserstein and includes the sliced Wasserstein as a special case, and we propose a fast minimization algorithm in $O(n)$ for the optimal Wasserstein-1 transport plan between two distributions in the tree structure. To this end, we propose an online algorithm for adaptive tree construction during the minimization of Wasserstein loss. Thanks to the nice property of the proposed algorithm, we can obtain the alignment in $O(n\log n)$ time. We then apply the proposed Wasserstein minimization algorithm for generative model estimation and propose a tree Wasserstein Auto Encoder. Through extensive experiments on 2D toy examples, high-dimension Mixture Gaussian, and real-world datasets, we show that our algorithm is compatible with the one-dimensional slice Wasserstein in computational time but more effective in higher-dimensional problems. Moreover, by using real-datasets, we demonstrate that the efficacy of the proposed algorithms.

PDF Abstract