In Bayesian inference, the posterior distributions are difficult to obtain
analytically for complex models such as neural networks. Variational inference
usually uses a parametric distribution for approximation, from which we can
easily draw samples...
Recently discrete approximation by particles has attracted
attention because of its high expression ability. An example is Stein
variational gradient descent (SVGD), which iteratively optimizes particles. Although SVGD has been shown to be computationally efficient empirically, its
theoretical properties have not been clarified yet and no finite sample bound
of the convergence rate is known. Another example is the Stein points (SP)
method, which minimizes kernelized Stein discrepancy directly. Although a
finite sample bound is assured theoretically, SP is computationally inefficient
empirically, especially in high-dimensional problems. In this paper, we propose
a novel method named maximum mean discrepancy minimization by the Frank-Wolfe
algorithm (MMD-FW), which minimizes MMD in a greedy way by the FW algorithm. Our method is computationally efficient empirically and we show that its finite
sample convergence bound is in a linear order in finite dimensions.