1 code implementation • NeurIPS 2023 • Berfin Şimşek, Amire Bendjeddou, Wulfram Gerstner, Johanni Brea
Approximating $f^*$ with a neural network with $n< k$ neurons can thus be seen as fitting an under-parameterized "student" network with $n$ neurons to a "teacher" network with $k$ neurons.
no code implementations • 29 Sep 2023 • Erin Grant, Sandra Nestler, Berfin Şimşek, Sara Solla
Lecture notes from the course given by Professor Sara A. Solla at the Les Houches summer school on "Statistical physics of Machine Learning".
2 code implementations • 25 Jan 2023 • Johanni Brea, Flavio Martinelli, Berfin Şimşek, Wulfram Gerstner
MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot \theta = -\nabla \mathcal L(\theta; \mathcal D)$, where $\theta$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, and $\nabla \mathcal L$ is the gradient of a loss function.
no code implementations • 30 Jun 2021 • Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel
The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.
1 code implementation • 25 May 2021 • Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea
For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another.
no code implementations • NeurIPS 2020 • Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel
Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor.
no code implementations • ICML 2020 • Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel
We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR).