1 code implementation • 4 Jun 2024 • Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan
We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates.
no code implementations • 24 May 2024 • Leonardo Defilippis, Bruno Loureiro, Theodor Misiakiewicz
Our main contribution is a general deterministic equivalent for the test error of RFRR.
1 code implementation • 24 May 2024 • Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala
Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural networks.
no code implementations • 21 Feb 2024 • Lucas Clarté, Adrien Vandenbroucque, Guillaume Dalle, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks.
1 code implementation • 21 Feb 2024 • Dominik Schröder, Daniil Dmitriev, Hugo Cui, Bruno Loureiro
For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large.
no code implementations • 8 Feb 2024 • Kasimir Tanner, Matteo Vilucchio, Bruno Loureiro, Florent Krzakala
This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $\alpha = n / d$.
1 code implementation • 7 Feb 2024 • Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step.
no code implementations • 28 Sep 2023 • Urte Adomaityte, Leonardo Defilippis, Bruno Loureiro, Gabriele Sicuro
In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist.
2 code implementations • 29 May 2023 • Luca Arnaboldi, Florent Krzakala, Bruno Loureiro, Ludovic Stephan
These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time.
1 code implementation • 29 May 2023 • Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan
The picture drastically improves over multiple gradient steps: we show that a batch-size of $n = \mathcal{O}(d)$ is indeed enough to learn multiple target directions satisfying a staircase property, where more and more directions can be learned over time.
2 code implementations • 5 Mar 2023 • Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
Despite their incredible performance, it is well reported that deep neural networks tend to be overoptimistic about their prediction confidence.
1 code implementation • 17 Feb 2023 • Luca Pesce, Florent Krzakala, Bruno Loureiro, Ludovic Stephan
Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we ask ourselves the question: "when is a single Gaussian enough to characterize the error?".
1 code implementation • 12 Feb 2023 • Luca Arnaboldi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro
This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function.
1 code implementation • 1 Feb 2023 • Dominik Schröder, Hugo Cui, Daniil Dmitriev, Bruno Loureiro
Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest.
1 code implementation • 23 Oct 2022 • Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
Uncertainty quantification is a central challenge in reliable and trustworthy machine learning.
1 code implementation • 26 May 2022 • Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors.
2 code implementations • 26 May 2022 • Federica Gerace, Florent Krzakala, Bruno Loureiro, Ludovic Stephan, Lenka Zdeborová
We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance.
1 code implementation • 22 Mar 2022 • Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, Lenka Zdeborová
For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality.
1 code implementation • 7 Feb 2022 • Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
In this manuscript, we characterise uncertainty for learning from limited number of samples of high-dimensional Gaussian input data and labels generated by the probit model.
2 code implementations • 1 Feb 2022 • Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent.
no code implementations • 31 Jan 2022 • Bruno Loureiro, Cédric Gerbelot, Maria Refinetti, Gabriele Sicuro, Florent Krzakala
From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Machine Learning practice.
no code implementations • 29 Jan 2022 • Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
We find that our rates tightly describe the learning curves for this class of data sets, and are also observed on real data.
no code implementations • 19 Jan 2022 • Ali Bereyhi, Bruno Loureiro, Florent Krzakala, Ralf R. Müller, Hermann Schulz-Baldes
Unlike the classical linear model, nonlinear generative models have been addressed sparsely in the literature of statistical learning.
no code implementations • NeurIPS 2021 • Bruno Loureiro, Gabriele Sicuro, Cedric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
2 code implementations • 7 Jun 2021 • Bruno Loureiro, Gabriele Sicuro, Cédric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
no code implementations • NeurIPS 2021 • Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
In this work, we unify and extend this line of work, providing characterization of all regimes and excess error decay rates that can be observed in terms of the interplay of noise and regularization.
1 code implementation • NeurIPS 2021 • Bruno Loureiro, Cédric Gerbelot, Hugo Cui, Sebastian Goldt, Florent Krzakala, Marc Mézard, Lenka Zdeborová
While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework.
1 code implementation • 25 Jun 2020 • Sebastian Goldt, Bruno Loureiro, Galen Reeves, Florent Krzakala, Marc Mézard, Lenka Zdeborová
Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models.
1 code implementation • NeurIPS 2020 • Antoine Maillard, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
We consider the phase retrieval problem of reconstructing a $n$-dimensional real or complex signal $\mathbf{X}^{\star}$ from $m$ (possibly noisy) observations $Y_\mu = | \sum_{i=1}^n \Phi_{\mu i} X^{\star}_i/\sqrt{n}|$, for a large class of correlated real and complex random sensing matrices $\mathbf{\Phi}$, in a high-dimensional setting where $m, n\to\infty$ while $\alpha = m/n=\Theta(1)$.
no code implementations • ICML 2020 • Federica Gerace, Bruno Loureiro, Florent Krzakala, Marc Mézard, Lenka Zdeborová
In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model.
no code implementations • 4 Dec 2019 • Benjamin Aubin, Bruno Loureiro, Antoine Baker, Florent Krzakala, Lenka Zdeborová
We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix.
no code implementations • NeurIPS Workshop Deep_Invers 2019 • Benjamin Aubin, Bruno Loureiro, Antoine Baker, Florent Krzakala, Lenka Zdeborova
We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix.
2 code implementations • NeurIPS 2019 • Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborová
Here, we replace the sparsity assumption by generative modelling, and investigate the consequences on statistical and algorithmic properties.