Optimal Mini-Batch Size Selection for Fast Gradient Descent

15 Nov 2019Michael P. PerroneHaidar KhanChanghoan KimAnastasios KyrillidisJerry QuinnValentina Salapura

This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems. By decoupling algorithmic analysis issues from hardware and software implementation details, we reveal a robust empirical inverse law between mini-batch size and the average number of SGD updates required to converge to a specified error threshold... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.