no code implementations • 1 Jan 2021 • Mingwei Wei, David J. Schwab
The strength of this effect is proportional to squared learning rate and inverse batch size, and is more effective during the early phase of training when the model's predictions are poor.
no code implementations • 1 Oct 2019 • Mingwei Wei, David J. Schwab
Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks.
no code implementations • ICLR 2019 • Mingwei Wei, James Stokes, David J. Schwab
Batch Normalization (BatchNorm) is an extremely useful component of modern neural network architectures, enabling optimization using higher learning rates and achieving faster convergence.