1 code implementation • 19 Apr 2023 • Pierre Marion, Raphaël Berthier
We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer.
no code implementations • 28 Feb 2023 • Raphaël Berthier, Andrea Montanari, Kangjie Zhou
In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i. e., the target function depends on a one-dimensional projection of the covariates).
no code implementations • 31 Aug 2022 • Raphaël Berthier
Diagonal linear networks (DLNs) are a toy simplification of artificial neural networks; they consist in a quadratic reparametrization of linear regression inducing a sparse implicit regularization.
no code implementations • NeurIPS 2021 • Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Hadrien Hendrikx, Pierre Gaillard, Laurent Massoulié, Adrien Taylor
We introduce the ``continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
no code implementations • 24 Sep 2021 • Cédric Gerbelot, Raphaël Berthier
Approximate-message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations.
1 code implementation • 10 Jun 2021 • Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor
We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
no code implementations • 11 Feb 2021 • Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor
We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
Distributed, Parallel, and Cluster Computing Optimization and Control
no code implementations • NeurIPS 2020 • Raphaël Berthier, Francis Bach, Pierre Gaillard
In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$.
1 code implementation • 22 May 2018 • Raphaël Berthier, Francis Bach, Pierre Gaillard
We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live.