1 code implementation • 28 Oct 2024 • Julie Kallini, Shikhar Murty, Christopher D. Manning, Christopher Potts, Róbert Csordás
Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts.
1 code implementation • 20 Aug 2024 • Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger
In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction.
1 code implementation • 25 May 2024 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber, Christopher Potts, Christopher D. Manning
The resulting UT model, for the first time, slightly outperforms standard Transformers on language modeling tasks such as BLiMP and PIQA, while using significantly less compute and memory.
2 code implementations • 13 Dec 2023 • Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber
Our novel SwitchHead is an effective MoE method for the attention layer that successfully reduces both the compute and memory requirements, achieving wall-clock speedup, while matching the language modeling performance of the baseline Transformer.
1 code implementation • 1 Dec 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments.
1 code implementation • 24 Oct 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions.
2 code implementations • 16 Oct 2023 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs.
no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber
What should be the social structure of an NLSOM?
1 code implementation • 26 May 2023 • Anian Ruoss, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness
Transformers have impressive generalization capabilities on tasks with a fixed context length.
1 code implementation • 15 Feb 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
Unsupervised learning of discrete representations in neural networks (NNs) from continuous ones is essential for many modern applications.
1 code implementation • 12 Oct 2022 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
While the original CTL is used to test length generalization or productivity, CTL++ is designed to test systematicity of NNs, that is, their capability to generalize to unseen compositions of known functions.
2 code implementations • 22 Sep 2022 • Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova, Andreea Deac, Beatrice Bevilacqua, Yaroslav Ganin, Charles Blundell, Petar Veličković
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution.
2 code implementations • 11 Feb 2022 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
The weight matrix (WM) of a neural network (NN) is its program.
1 code implementation • 11 Feb 2022 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience.
1 code implementation • 31 Dec 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts.
1 code implementation • 14 Oct 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite progress across a broad range of applications, Transformers have limited success in systematic generalization.
no code implementations • NeurIPS Workshop AIPLANS 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.
no code implementations • ICLR 2022 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.
2 code implementations • EMNLP 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS.
5 code implementations • NeurIPS 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s.
1 code implementation • ICLR 2021 • Róbert Csordás, Sjoerd van Steenkiste, Jürgen Schmidhuber
Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc.
1 code implementation • 23 Apr 2019 • Róbert Csordás, Jürgen Schmidhuber
The Differentiable Neural Computer (DNC) can learn algorithmic and question answering tasks.