no code implementations • 20 Feb 2023 • Weihang Xu, Simon S. Du
This is the first global convergence result for this problem beyond the exact-parameterization setting ($n=1$) in which the gradient descent enjoys an $\exp(-\Omega(T))$ rate.