Search Results for author: Imanol Schlag

Found 18 papers, 13 papers with code

Language Imbalance Can Boost Cross-lingual Generalisation

2 code implementations • 11 Apr 2024 • Anton Schäfer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel, Imanol Schlag

In controlled experiments on perfectly equivalent cloned languages, we observe that the existence of a predominant language during training boosts the performance of less frequent languages and leads to stronger alignment of model representations across languages.

Language Modelling

124

Paper
Code

On the Effect of (Near) Duplicate Subwords in Language Modelling

2 code implementations • 9 Apr 2024 • Anton Schäfer, Thomas Hofmann, Imanol Schlag, Tiago Pimentel

In this paper, we study the impact of near duplicate subwords on LM training efficiency.

Language Modelling

124

Paper
Code

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

no code implementations • 6 Nov 2023 • Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

This leads to the notion of a `compute-optimal' model, i. e. a model that allocates a given level of compute during training optimally to maximize performance.

Paper
Add Code

The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

1 code implementation • 20 Sep 2023 • Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag

We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.

Language Modelling

Paper
Code

Mindstorms in Natural Language-Based Societies of Mind

no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber

What should be the social structure of an NLSOM?

3D Generation Image Captioning +2

Paper
Add Code

Large Language Model Programs

no code implementations • 9 May 2023 • Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.

Language Modelling Large Language Model +1

Paper
Add Code

Solving Quantitative Reasoning Problems with Language Models

1 code implementation • 29 Jun 2022 • Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding.

Ranked #5 on Math Word Problem Solving on MATH

Arithmetic Reasoning Language Modelling +4

273

Paper
Code

Block-Recurrent Transformers

3 code implementations • 11 Mar 2022 • DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens.

Language Modelling

232

Paper
Code

A Modern Self-Referential Weight Matrix That Learns to Modify Itself

2 code implementations • 11 Feb 2022 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

The weight matrix (WM) of a neural network (NN) is its program.

Few-Shot Learning

159

Paper
Code

Improving Baselines in the Wild

1 code implementation • 31 Dec 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts.

Paper
Code

Augmenting Classic Algorithms with Neural Components for Strong Generalisation on Ambiguous and High-Dimensional Data

no code implementations • NeurIPS Workshop AIPLANS 2021 • Imanol Schlag, Jürgen Schmidhuber

We augment classic algorithms with learned components to adapt them to domains currently dominated by deep learning models.

Paper
Add Code

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

5 code implementations • NeurIPS 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s.

Atari Games ListOps

159

Paper
Code

Linear Transformers Are Secretly Fast Weight Programmers

9 code implementations • 22 Feb 2021 • Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber

We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early '90s, where a ``slow" neural net learns by gradient descent to program the ``fast weights" of another net through sequences of elementary programming instructions which are additive outer products of self-invented activation patterns (today called keys and values).

Language Modelling Machine Translation +2