# Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

Peter L. BartlettDavid P. HelmboldPhilip M. Long

We analyze algorithms for approximating a function $f(x) = \Phi x$ mapping $\Re^d$ to $\Re^d$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $\Theta_1,...,\Theta_L$ and defined by $h(x) = \Theta_L \Theta_{L-1} ... \Theta_1 x$. We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic... (read more)

PDF Abstract

No code implementations yet. Submit your code now