Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

11 Feb 2015 Sergey Ioffe Christian Szegedy

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities... (read more)

PDF Abstract

Datasets


Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Image Classification ImageNet Inception V2 Top 1 Accuracy 74.8% # 330
Top 5 Accuracy 92.2% # 177
Number of params 11.2M # 166

Methods used in the Paper


METHOD TYPE
Convolution
Convolutions
Auxiliary Classifier
Miscellaneous Components
1x1 Convolution
Convolutions
ReLU
Activation Functions
Dropout
Regularization
Dense Connections
Feedforward Networks
Max Pooling
Pooling Operations
Softmax
Output Functions
Random Horizontal Flip
Image Data Augmentation
Random Resized Crop
Image Data Augmentation
SGD with Momentum
Stochastic Optimization
Exponential Decay
Learning Rate Schedules
Inception Module
Image Model Blocks
Inception v2
Convolutional Neural Networks
Weight Decay
Regularization
Batch Normalization
Normalization