Search Results for author: Berfin Şimşek

Found 7 papers, 3 papers with code

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

1 code implementation • NeurIPS 2023 • Berfin Şimşek, Amire Bendjeddou, Wulfram Gerstner, Johanni Brea

Approximating $f^*$ with a neural network with $n< k$ neurons can thus be seen as fitting an under-parameterized "student" network with $n$ neurons to a "teacher" network with $k$ neurons.

Paper
Code

Statistical physics, Bayesian inference and neural information processing

no code implementations • 29 Sep 2023 • Erin Grant, Sandra Nestler, Berfin Şimşek, Sara Solla

Lecture notes from the course given by Professor Sara A. Solla at the Les Houches summer school on "Statistical physics of Machine Learning".

Bayesian Inference Dimensionality Reduction

Paper
Add Code

MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)

2 code implementations • 25 Jan 2023 • Johanni Brea, Flavio Martinelli, Berfin Şimşek, Wulfram Gerstner

MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot \theta = -\nabla \mathcal L(\theta; \mathcal D)$, where $\theta$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, and $\nabla \mathcal L$ is the gradient of a loss function.

Paper
Code

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

no code implementations • 30 Jun 2021 • Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.

L2 Regularization

Paper
Add Code

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

1 code implementation • 25 May 2021 • Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea

For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another.

Paper
Code

Kernel Alignment Risk Estimator: Risk Prediction from Training Data

no code implementations • NeurIPS 2020 • Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor.

regression

Paper
Add Code

Implicit Regularization of Random Feature Models

no code implementations • ICML 2020 • Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.