Search Results for author: João Sacramento

Found 22 papers, 14 papers with code

When can transformers compositionally generalize in-context?

no code implementations17 Jul 2024 Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, João Sacramento

Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components?

State Soup: In-Context Skill Learning, Retrieval and Mixing

no code implementations12 Jun 2024 Maciej Pióro, Maciej Wołczyk, Razvan Pascanu, Johannes von Oswald, João Sacramento

A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems.

In-Context Learning Mamba +1

Attention as a Hypernetwork

1 code implementation9 Jun 2024 Simon Schug, Seijin Kobayashi, Yassir Akram, João Sacramento, Razvan Pascanu

To further examine the hypothesis that the intrinsic hypernetwork of multi-head attention supports compositional generalization, we ablate whether making the hypernetwork generated linear value network nonlinear strengthens compositionality.

Uncovering mesa-optimization algorithms in Transformers

no code implementations11 Sep 2023 Johannes von Oswald, Maximilian Schlegel, Alexander Meulemans, Seijin Kobayashi, Eyvind Niklasson, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

Some autoregressive models exhibit in-context learning capabilities: being able to learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.

In-Context Learning Language Modelling

Gated recurrent neural networks discover attention

no code implementations4 Sep 2023 Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento

In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers.

In-Context Learning

Online learning of long-range dependencies

1 code implementation NeurIPS 2023 Nicolas Zucchet, Robert Meier, Simon Schug, Asier Mujika, João Sacramento

Online learning holds the promise of enabling efficient long-term credit assignment in recurrent neural networks.

Transformers learn in-context by gradient descent

1 code implementation15 Dec 2022 Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov

We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss.

In-Context Learning Meta-Learning +1

The least-control principle for local learning at equilibrium

1 code implementation4 Jul 2022 Alexander Meulemans, Nicolas Zucchet, Seijin Kobayashi, Johannes von Oswald, João Sacramento

As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning.

BIG-bench Machine Learning Meta-Learning

Minimizing Control for Credit Assignment with Strong Feedback

2 code implementations14 Apr 2022 Alexander Meulemans, Matilde Tristany Farinha, Maria R. Cervera, João Sacramento, Benjamin F. Grewe

Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization.

Credit Assignment in Neural Networks through Deep Feedback Control

3 code implementations NeurIPS 2021 Alexander Meulemans, Matilde Tristany Farinha, Javier García Ordóñez, Pau Vilimelis Aceituno, João Sacramento, Benjamin F. Grewe

The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output.

Conductance-based dendrites perform Bayes-optimal cue integration

no code implementations27 Apr 2021 Jakob Jordan, João Sacramento, Willem A. M. Wybo, Mihai A. Petrovici, Walter Senn

We propose a novel, Bayesian view on the dynamics of conductance-based neurons and synapses which suggests that they are naturally equipped to optimally perform information integration.

A contrastive rule for meta-learning

1 code implementation4 Apr 2021 Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento

Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data.

Meta-Learning

Posterior Meta-Replay for Continual Learning

3 code implementations NeurIPS 2021 Christian Henning, Maria R. Cervera, Francesco D'Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento

We offer a practical deep learning implementation of our framework based on probabilistic task-conditioned hypernetworks, an approach we term posterior meta-replay.

Continual Learning

Neural networks with late-phase weights

2 code implementations ICLR 2021 Johannes von Oswald, Seijin Kobayashi, Alexander Meulemans, Christian Henning, Benjamin F. Grewe, João Sacramento

The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD).

Ranked #70 on Image Classification on CIFAR-100 (using extra training data)

Image Classification

A Theoretical Framework for Target Propagation

2 code implementations NeurIPS 2020 Alexander Meulemans, Francesco S. Carzaniga, Johan A. K. Suykens, João Sacramento, Benjamin F. Grewe

Here, we analyze target propagation (TP), a popular but not yet fully understood alternative to BP, from the standpoint of mathematical optimization.

Dendritic cortical microcircuits approximate the backpropagation algorithm

no code implementations NeurIPS 2018 João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience.

Dendritic error backpropagation in deep cortical microcircuits

1 code implementation30 Dec 2017 João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Animal behaviour depends on learning to associate sensory stimuli with the desired motor command.

Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.