Search Results for author: Róbert Csordás

Found 22 papers, 19 papers with code

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

1 code implementation28 Oct 2024 Julie Kallini, Shikhar Murty, Christopher D. Manning, Christopher Potts, Róbert Csordás

Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts.

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

1 code implementation20 Aug 2024 Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger

In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction.

Position

MoEUT: Mixture-of-Experts Universal Transformers

1 code implementation25 May 2024 Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber, Christopher Potts, Christopher D. Manning

The resulting UT model, for the first time, slightly outperforms standard Transformers on language modeling tasks such as BLiMP and PIQA, while using significantly less compute and memory.

Language Modeling Language Modelling

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

2 code implementations13 Dec 2023 Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber

Our novel SwitchHead is an effective MoE method for the attention layer that successfully reduces both the compute and memory requirements, achieving wall-clock speedup, while matching the language modeling performance of the baseline Transformer.

Language Modeling Language Modelling

Automating Continual Learning

1 code implementation1 Dec 2023 Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments.

Continual Learning Image Classification +2

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

1 code implementation24 Oct 2023 Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions.

Approximating Two-Layer Feedforward Networks for Efficient Transformers

2 code implementations16 Oct 2023 Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs.

Self-Organising Neural Discrete Representation Learning à la Kohonen

1 code implementation15 Feb 2023 Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

Unsupervised learning of discrete representations in neural networks (NNs) from continuous ones is essential for many modern applications.

Representation Learning

CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations

1 code implementation12 Oct 2022 Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

While the original CTL is used to test length generalization or productivity, CTL++ is designed to test systematicity of NNs, that is, their capability to generalize to unseen compositions of known functions.

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

1 code implementation11 Feb 2022 Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience.

Continual Learning Image Classification +1

Improving Baselines in the Wild

1 code implementation31 Dec 2021 Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts.

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

1 code implementation14 Oct 2021 Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

Despite progress across a broad range of applications, Transformers have limited success in systematic generalization.

ListOps Systematic Generalization

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

5 code implementations NeurIPS 2021 Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s.

Atari Games ListOps

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

1 code implementation ICLR 2021 Róbert Csordás, Sjoerd van Steenkiste, Jürgen Schmidhuber

Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc.

Systematic Generalization

Cannot find the paper you are looking for? You can Submit a new open access paper.