Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

23 Mar 2020Charles G. FryeJames SimonNeha S. WadiaAndrew LigeraldeMichael R. DeWeeseKristofer E. Bouchard

Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstrating that neural network losses enjoy a no-bad-local-minima property and an abundance of saddle points... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.