Search Results for author: Seijin Kobayashi

Found 9 papers, 6 papers with code

Neural networks with late-phase weights

2 code implementations • ICLR 2021 • Johannes von Oswald, Seijin Kobayashi, Alexander Meulemans, Christian Henning, Benjamin F. Grewe, João Sacramento

The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD).

Ranked #70 on Image Classification on CIFAR-100 (using extra training data)

Image Classification

1,360

Paper
Code

Posterior Meta-Replay for Continual Learning

3 code implementations • NeurIPS 2021 • Christian Henning, Maria R. Cervera, Francesco D'Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento

We offer a practical deep learning implementation of our framework based on probabilistic task-conditioned hypernetworks, an approach we term posterior meta-replay.

Continual Learning

109

Paper
Code

Learning where to learn: Gradient sparsity in meta and continual learning

1 code implementation • NeurIPS 2021 • Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis.

Continual Learning Inductive Bias +2

Paper
Code

The least-control principle for local learning at equilibrium

1 code implementation • 4 Jul 2022 • Alexander Meulemans, Nicolas Zucchet, Seijin Kobayashi, Johannes von Oswald, João Sacramento

As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning.

BIG-bench Machine Learning Meta-Learning

Paper
Code

Meta-Learning via Classifier(-free) Diffusion Guidance

1 code implementation • 17 Oct 2022 • Elvis Nava, Seijin Kobayashi, Yifei Yin, Robert K. Katzschmann, Benjamin F. Grewe

Our methods repurpose the popular generative image synthesis techniques of natural language guidance and diffusion models to generate neural network weights adapted for tasks.

Few-Shot Learning Image Generation +2

Paper
Code

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

no code implementations • 18 Oct 2022 • Seijin Kobayashi, Pau Vilimelis Aceituno, Johannes von Oswald

Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process.

Decision Making Inductive Bias +1

Paper
Add Code

Gated recurrent neural networks discover attention

no code implementations • 4 Sep 2023 • Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento

In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers.

In-Context Learning

Paper
Add Code

Uncovering mesa-optimization algorithms in Transformers

no code implementations • 11 Sep 2023 • Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood.

In-Context Learning Language Modelling

Paper
Add Code

Discovering modular solutions that generalize compositionally

1 code implementation • 22 Dec 2023 • Simon Schug, Seijin Kobayashi, Yassir Akram, Maciej Wołczyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, João Sacramento, Angelika Steger

This allows us to relate the problem of compositional generalization to that of identification of the underlying modules.

Meta-Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.