no code implementations • 7 May 2018 • Santosh Vempala, John Wilmes
We give an agnostic learning guarantee for GD: starting from a randomly initialized network, it converges in mean squared loss to the minimum error (in $2$-norm) of the best approximation of the target function using a polynomial of degree at most $k$.
no code implementations • NeurIPS 2017 • Le Song, Santosh Vempala, John Wilmes, Bo Xie
Moreover, this hard family of functions is realizable with a small (sublinear in dimension) number of activation units in the single hidden layer.