We identify a class of over-parameterized deep neural networks with standard
activation functions and cross-entropy loss which provably have no bad local
valley, in the sense that from any point in parameter space there exists a
continuous path on which the cross-entropy loss is non-increasing and gets
arbitrarily close to zero. This implies that these networks have no sub-optimal
strict local minima...