On the Convergence Rate of Training Recurrent Neural Networks

NeurIPS 2019 Zeyuan Allen-ZhuYuanzhi LiZhao Song

How can local-search methods such as stochastic gradient descent (SGD) avoid bad local minima in training multi-layer neural networks? Why can they fit random labels even given non-convex and non-smooth architectures?.. (read more)

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract

Code


No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper