Search Results for author: Berfin Şimşek

Found 7 papers, 3 papers with code

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

1 code implementation NeurIPS 2023 Berfin Şimşek, Amire Bendjeddou, Wulfram Gerstner, Johanni Brea

Approximating $f^*$ with a neural network with $n< k$ neurons can thus be seen as fitting an under-parameterized "student" network with $n$ neurons to a "teacher" network with $k$ neurons.

Statistical physics, Bayesian inference and neural information processing

no code implementations29 Sep 2023 Erin Grant, Sandra Nestler, Berfin Şimşek, Sara Solla

Lecture notes from the course given by Professor Sara A. Solla at the Les Houches summer school on "Statistical physics of Machine Learning".

Bayesian Inference Dimensionality Reduction

MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)

2 code implementations25 Jan 2023 Johanni Brea, Flavio Martinelli, Berfin Şimşek, Wulfram Gerstner

MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot \theta = -\nabla \mathcal L(\theta; \mathcal D)$, where $\theta$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, and $\nabla \mathcal L$ is the gradient of a loss function.

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

no code implementations30 Jun 2021 Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.

L2 Regularization

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

1 code implementation25 May 2021 Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea

For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another.

Kernel Alignment Risk Estimator: Risk Prediction from Training Data

no code implementations NeurIPS 2020 Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor.

regression

Implicit Regularization of Random Feature Models

no code implementations ICML 2020 Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR).

Cannot find the paper you are looking for? You can Submit a new open access paper.