1 code implementation • 11 Feb 2022 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience.
2 code implementations • 11 Feb 2022 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
The weight matrix (WM) of a neural network (NN) is its program.
1 code implementation • 31 Dec 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts.
1 code implementation • 14 Oct 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite progress across a broad range of applications, Transformers have limited success in systematic generalization.
no code implementations • NeurIPS Workshop AIPLANS 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.
no code implementations • ICLR 2022 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.
1 code implementation • EMNLP 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS.
5 code implementations • NeurIPS 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s.
1 code implementation • ICLR 2021 • Róbert Csordás, Sjoerd van Steenkiste, Jürgen Schmidhuber
Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc.
1 code implementation • 23 Apr 2019 • Róbert Csordás, Jürgen Schmidhuber
The Differentiable Neural Computer (DNC) can learn algorithmic and question answering tasks.