Bayesian Learning via Stochastic Gradient Langevin Dynamics

ICML 2011 2011  ·  Max Welling, Yee Whye Teh ·

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization al- gorithm we show that the iterates will con- verge to samples from the true posterior dis- tribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in- built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which moni- tors a “sampling threshold” and collects sam- ples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods