Batch Normalization: Accelerating Deep Network Training byReducing Internal Covariate Shift

ICML 2015 2015  ·  Sergey Ioffe, Christian Szegedy ·

Training Deep Neural Networks is complicated by the factthat the distribution of each layer’s inputs changes duringtraining, as the parameters of the previous layers change.This slows down the training by requiring lower learningrates and careful parameter initialization, and makes it no-toriously hard to train models with saturating nonlineari-ties. We refer to this phenomenon asinternal covariateshift, and address the problem by normalizing layer in-puts. Our method draws its strength from making normal-ization a part of the model architecture and performing thenormalizationfor each training mini-batch. Batch Nor-malization allows us to use much higher learning rates andbe less careful about initialization. It also acts as a regu-larizer, in some cases eliminating the need for Dropout.Applied to a state-of-the-art image classification model,Batch Normalization achieves the same accuracy with 14times fewer training steps, and beats the original modelby a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best publishedresult on ImageNet classification: reaching 4.9% top-5validation error (and 4.8% test error), exceeding the ac-curacy of human raters.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.