no code implementations • 27 Mar 2025 • Nicolas Zucchet, Jörg Bornschein, Stephanie Chan, Andrew Lampinen, Razvan Pascanu, Soham De
Large language models accumulate vast knowledge during pre-training, yet the dynamics governing this acquisition remain poorly understood.
1 code implementation • 31 May 2024 • Nicolas Zucchet, Antonio Orvieto
Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients.
no code implementations • 11 Sep 2023 • Johannes von Oswald, Maximilian Schlegel, Alexander Meulemans, Seijin Kobayashi, Eyvind Niklasson, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento
Some autoregressive models exhibit in-context learning capabilities: being able to learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
no code implementations • 4 Sep 2023 • Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento
In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers.
1 code implementation • NeurIPS 2023 • Nicolas Zucchet, Robert Meier, Simon Schug, Asier Mujika, João Sacramento
Online learning holds the promise of enabling efficient long-term credit assignment in recurrent neural networks.
1 code implementation • 15 Sep 2022 • Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger
Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions.
1 code implementation • 4 Jul 2022 • Alexander Meulemans, Nicolas Zucchet, Seijin Kobayashi, Johannes von Oswald, João Sacramento
As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning.
no code implementations • 6 May 2022 • Nicolas Zucchet, João Sacramento
This paper reviews gradient-based techniques to solve bilevel optimization problems.
1 code implementation • NeurIPS 2021 • Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento
We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis.
1 code implementation • 4 Apr 2021 • Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento
Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data.
no code implementations • ICLR Workshop Learning_to_Learn 2021 • Dominic Zhao, Nicolas Zucchet, Joao Sacramento, Johannes von Oswald
Finding neural network weights that generalize well from small datasets is difficult.