1 code implementation • ICLR 2022 • Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli
In particular, we show via Gordon's escape theorem, that the training dimension plus the Gaussian width of the desired loss sub-level set, projected onto a unit sphere surrounding the initialization, must exceed the total number of parameters for the success probability to be large.