Bayesian Learning via Stochastic Gradient Langevin Dynamics
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization al- gorithm we show that the iterates will con- verge to samples from the true posterior dis- tribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in- built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which moni- tors a “sampling threshold” and collects sam- ples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.
PDF Abstract