We show that training a deep network using batch normalization is equivalent
to approximate inference in Bayesian models. We further demonstrate that this
finding allows us to make meaningful estimates of the model uncertainty using
conventional architectures, without modifications to the network or the
training procedure. Our approach is thoroughly validated by measuring the
quality of uncertainty in a series of empirical experiments on different tasks.
It outperforms baselines with strong statistical significance, and displays
competitive performance with recent Bayesian approaches.