no code implementations • 31 Oct 2024 • Seijin Kobayashi, Yassir Akram, Johannes von Oswald
The effect of regularizers such as weight decay when training deep neural networks is not well understood.
no code implementations • 24 Oct 2024 • Alexander Meulemans, Seijin Kobayashi, Johannes von Oswald, Nino Scherrer, Eric Elmoznino, Blake Richards, Guillaume Lajoie, Blaise Agüera y Arcas, João Sacramento
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
no code implementations • 20 Aug 2024 • Johannes von Oswald, Seijin Kobayashi, Yassir Akram, Angelika Steger
Randomization is a powerful tool that endows algorithms with remarkable properties.
no code implementations • 17 Jul 2024 • Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, João Sacramento
Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components?
1 code implementation • 9 Jun 2024 • Simon Schug, Seijin Kobayashi, Yassir Akram, João Sacramento, Razvan Pascanu
To further examine the hypothesis that the intrinsic hypernetwork of multi-head attention supports compositional generalization, we ablate whether making the hypernetwork generated linear value network nonlinear strengthens compositionality.
1 code implementation • 22 Dec 2023 • Simon Schug, Seijin Kobayashi, Yassir Akram, Maciej Wołczyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, João Sacramento, Angelika Steger
This allows us to relate the problem of compositional generalization to that of identification of the underlying modules.
no code implementations • 11 Sep 2023 • Johannes von Oswald, Maximilian Schlegel, Alexander Meulemans, Seijin Kobayashi, Eyvind Niklasson, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento
Some autoregressive models exhibit in-context learning capabilities: being able to learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
no code implementations • 4 Sep 2023 • Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento
In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers.
no code implementations • 18 Oct 2022 • Seijin Kobayashi, Pau Vilimelis Aceituno, Johannes von Oswald
Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process.
1 code implementation • 17 Oct 2022 • Elvis Nava, Seijin Kobayashi, Yifei Yin, Robert K. Katzschmann, Benjamin F. Grewe
Our methods repurpose the popular generative image synthesis techniques of natural language guidance and diffusion models to generate neural network weights adapted for tasks.
1 code implementation • 4 Jul 2022 • Alexander Meulemans, Nicolas Zucchet, Seijin Kobayashi, Johannes von Oswald, João Sacramento
As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning.
1 code implementation • NeurIPS 2021 • Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento
We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis.
3 code implementations • NeurIPS 2021 • Christian Henning, Maria R. Cervera, Francesco D'Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento
We offer a practical deep learning implementation of our framework based on probabilistic task-conditioned hypernetworks, an approach we term posterior meta-replay.
2 code implementations • ICLR 2021 • Johannes von Oswald, Seijin Kobayashi, Alexander Meulemans, Christian Henning, Benjamin F. Grewe, João Sacramento
The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD).
Ranked #69 on
Image Classification
on CIFAR-100
(using extra training data)