The Break-Even Point on Optimization Trajectories of Deep Neural Networks

21 Feb 2020Stanislaw JastrzebskiMaciej SzymczakStanislav FortDevansh ArpitJacek TaborKyunghyun ChoKrzysztof Geras

The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.