Faster SGD training by minibatch persistency

19 Jun 2018Matteo FischettiIacopo MandatelliDomenico Salvagnin

It is well known that, for most datasets, the use of large-size minibatches for Stochastic Gradient Descent (SGD) typically leads to slow convergence and poor generalization. On the other hand, large minibatches are of great practical interest as they allow for a better exploitation of modern GPUs... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.