Dynamics of Deep Equilibrium Linear Models

ICLR 2021  ·  Kenji Kawaguchi ·

A deep equilibrium linear model is implicitly defined through an equilibrium point of an infinite sequence of computation. It avoids any explicit computation of the infinite sequence by finding an equilibrium point directly via root-finding and by computing gradients via implicit differentiation. In this paper, we analyze the gradient dynamics of deep equilibrium linear models with non-convex objective functions for a general class of losses used in regression and classification. Convergence to global optimum at a linear rate is guaranteed without any assumption on the width of the models, allowing the width to be smaller than the output dimension and the number of data points. Moreover, we prove a relation between the gradient dynamics of deep equilibrium linear models and the dynamics of trust region Newton method of shallow models with time-dependent quadratic norms. This mathematically proven relation along with our numerical observation suggests the importance of understanding implicit bias and a possible open problem on the topic. Our proofs differ significantly from those in the related literature.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here