Spurious Local Minima are Common in Two-Layer ReLU Neural Networks

ICML 2018 Itay SafranOhad Shamir

We consider the optimization problem associated with training simple ReLU neural networks of the form $\mathbf{x}\mapsto \sum_{i=1}^{k}\max\{0,\mathbf{w}_i^\top \mathbf{x}\}$ with respect to the squared loss. We provide a computer-assisted proof that even if the input distribution is standard Gaussian, even if the dimension is arbitrarily large, and even if the target values are generated by such a network, with orthonormal parameter vectors, the problem can still have spurious local minima once $6\le k\le 20$... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.