Search Results for author: Simon Schug

Found 7 papers, 6 papers with code

When can transformers compositionally generalize in-context?

no code implementations17 Jul 2024 Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, João Sacramento

Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components?

Attention as a Hypernetwork

1 code implementation9 Jun 2024 Simon Schug, Seijin Kobayashi, Yassir Akram, João Sacramento, Razvan Pascanu

To further examine the hypothesis that the intrinsic hypernetwork of multi-head attention supports compositional generalization, we ablate whether making the hypernetwork generated linear value network nonlinear strengthens compositionality.

Online learning of long-range dependencies

1 code implementation NeurIPS 2023 Nicolas Zucchet, Robert Meier, Simon Schug, Asier Mujika, João Sacramento

Online learning holds the promise of enabling efficient long-term credit assignment in recurrent neural networks.

Random initialisations performing above chance and how to find them

1 code implementation15 Sep 2022 Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger

Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions.

A contrastive rule for meta-learning

1 code implementation4 Apr 2021 Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento

Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data.

Meta-Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.