no code implementations • 28 Feb 2024 • Vivien Cabannes, Berfin Simsek, Alberto Bietti
This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings.
no code implementations • 8 Feb 2024 • Zhengqing Wu, Berfin Simsek, Francois Ged
Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum.
no code implementations • 25 Apr 2023 • Flavio Martinelli, Berfin Simsek, Wulfram Gerstner, Johanni Brea
Can we identify the parameters of a neural network by probing its input-output mapping?
no code implementations • 28 Mar 2022 • Berfin Simsek, Melissa Hall, Levent Sagun
Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets \cite{recht2018cifar, recht2019imagenet}.
no code implementations • 25 Sep 2019 • Berfin Simsek, Johanni Brea, Bernd Illing, Wulfram Gerstner
In a network of $d-1$ hidden layers with $n_k$ neurons in layers $k = 1, \ldots, d$, we construct continuous paths between equivalent global minima that lead through a `permutation point' where the input and output weight vectors of two neurons in the same hidden layer $k$ collide and interchange.
no code implementations • 5 Jul 2019 • Johanni Brea, Berfin Simsek, Bernd Illing, Wulfram Gerstner
The permutation symmetry of neurons in each layer of a deep neural network gives rise not only to multiple equivalent global minima of the loss function, but also to first-order saddle points located on the path between the global minima.