no code implementations • 17 Aug 2023 • Elizabeth Collins-Woodfin, Courtney Paquette, Elliot Paquette, Inbar Seroussi
In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient (homogenized SGD) which allows us to analyze the dynamics of general statistics of SGD iterates.
no code implementations • 3 Jul 2023 • Afonso S. Bandeira, Antoine Maillard, Shahar Mendelson, Elliot Paquette
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$.
no code implementations • 15 Jun 2022 • Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington
Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems.
no code implementations • 2 Jun 2022 • Kiwon Lee, Andrew N. Cheng, Courtney Paquette, Elliot Paquette
We analyze the dynamics of large batch stochastic gradient descent with momentum (SGD+M) on the least squares problem when both the number of samples and dimensions are large.
no code implementations • 14 May 2022 • Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington
By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the generalization performance of SGD in terms of a solution of a Volterra integral equation.
no code implementations • NeurIPS 2021 • Courtney Paquette, Elliot Paquette
We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem.
no code implementations • 8 Feb 2021 • Courtney Paquette, Kiwon Lee, Fabian Pedregosa, Elliot Paquette
We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large.
no code implementations • 8 Jun 2020 • Courtney Paquette, Bart van Merriënboer, Elliot Paquette, Fabian Pedregosa
In fact, the halting time exhibits a universality property: it is independent of the probability distribution.