We analyze algorithms for approximating a function $f(x) = \Phi x$ mapping $\Re^d$ to $\Re^d$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $\Theta_1,...,\Theta_L$ and defined by $h(x) = \Theta_L \Theta_{L-1} ... \Theta_1 x$. We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic... (read more)

PDF
Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.

METHOD | TYPE | |
---|---|---|

🤖 No Methods Found | Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet |