Large batch size training of neural networks with adversarial training and second-order information

The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size... (read more)

PDF Abstract ICLR 2019 PDF ICLR 2019 Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper