1 code implementation • 15 Sep 2022 • Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger
Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions.
1 code implementation • 28 Jan 2022 • Frederik Benzing
Second-order optimizers are thought to hold the potential to speed up neural network training, but due to the enormous size of the curvature matrix, they typically require approximations to be computationally tractable.
2 code implementations • 11 Jun 2020 • Frederik Benzing
Moreover, we show that for SI the relation to the Fisher -- and in fact its performance -- is due to a previously unknown bias.
1 code implementation • 11 Feb 2019 • Frederik Benzing, Marcelo Matheus Gauy, Asier Mujika, Anders Martinsson, Angelika Steger
In contrast, the online training algorithm Real Time Recurrent Learning (RTRL) provides untruncated gradients, with the disadvantage of impractically large computational costs.