Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization

27 Feb 2022  ·  Idan Amir, Roi Livni, Nathan Srebro ·

We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i.e.~where each instantaneous loss is a scalar convex function of a linear function. We show that in this setting, early stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $\epsilon$ (compared to the best possible with unit Euclidean norm) with an optimal, up to logarithmic factors, sample complexity of $\tilde{O}(1/\epsilon^2)$ and only $\tilde{O}(1/\epsilon^2)$ iterations. This contrasts with general stochastic convex optimization, where $\Omega(1/\epsilon^4)$ iterations are needed Amir et al. [2021b]. The lower iteration complexity is ensured by leveraging uniform convergence rather than stability. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $\Theta(1/\epsilon^4)$ samples, we rely on uniform convergence in a distribution-dependent ball.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here