Search Results for author: Imanol Schlag

Found 18 papers, 13 papers with code

Language Imbalance Can Boost Cross-lingual Generalisation

2 code implementations11 Apr 2024 Anton Schäfer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel, Imanol Schlag

In controlled experiments on perfectly equivalent cloned languages, we observe that the existence of a predominant language during training boosts the performance of less frequent languages and leads to stronger alignment of model representations across languages.

Language Modelling

On the Effect of (Near) Duplicate Subwords in Language Modelling

2 code implementations9 Apr 2024 Anton Schäfer, Thomas Hofmann, Imanol Schlag, Tiago Pimentel

In this paper, we study the impact of near duplicate subwords on LM training efficiency.

Language Modelling

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

no code implementations6 Nov 2023 Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

This leads to the notion of a `compute-optimal' model, i. e. a model that allocates a given level of compute during training optimally to maximize performance.

Large Language Model Programs

no code implementations9 May 2023 Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.

Language Modelling Large Language Model +1

Block-Recurrent Transformers

3 code implementations11 Mar 2022 DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens.

Language Modelling

Improving Baselines in the Wild

1 code implementation31 Dec 2021 Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts.

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

5 code implementations NeurIPS 2021 Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s.

Atari Games ListOps

Linear Transformers Are Secretly Fast Weight Programmers

9 code implementations22 Feb 2021 Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber

We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early '90s, where a ``slow" neural net learns by gradient descent to program the ``fast weights" of another net through sequences of elementary programming instructions which are additive outer products of self-invented activation patterns (today called keys and values).

Language Modelling Machine Translation +2

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

3 code implementations15 Oct 2019 Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng Gao

We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure.

Math Question Answering

Learning to Reason with Third Order Tensor Products

1 code implementation NeurIPS 2018 Imanol Schlag, Jürgen Schmidhuber

We combine Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data.

Learning to Reason with Third-Order Tensor Products

1 code implementation29 Nov 2018 Imanol Schlag, Jürgen Schmidhuber

We combine Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data.

GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL

no code implementations ICLR 2018 Imanol Schlag, Jürgen Schmidhuber

We improve previous end-to-end differentiable neural networks (NNs) with fast weight memories.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.