Search Results for author: Masahiro Ozawa

Found 1 papers, 0 papers with code

Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD

no code implementations26 Jun 2019 Kosuke Haruki, Taiji Suzuki, Yohei Hamakawa, Takeshi Toda, Ryuji Sakai, Masahiro Ozawa, Mitsuhiro Kimura

Large-batch stochastic gradient descent (SGD) is widely used for training in distributed deep learning because of its training-time efficiency, however, extremely large-batch SGD leads to poor generalization and easily converges to sharp minima, which prevents naive large-scale data-parallel SGD (DP-SGD) from converging to good minima.

Cannot find the paper you are looking for? You can Submit a new open access paper.