Search Results for author: Nathan Godey

Found 6 papers, 0 papers with code

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

no code implementations • 11 Apr 2024 • Nathan Godey, Éric de la Clergerie, Benoît Sagot

In this paper, we find that such saturation can be explained by a mismatch between the hidden dimension of smaller models and the high rank of the target contextual probability distribution.

Language Modelling

Paper
Add Code

On the Scaling Laws of Geographical Representation in Language Models

no code implementations • 29 Feb 2024 • Nathan Godey, Éric de la Clergerie, Benoît Sagot

Language models have long been shown to embed geographical information in their hidden representations.

Paper
Add Code

Anisotropy Is Inherent to Self-Attention in Transformers

no code implementations • 22 Jan 2024 • Nathan Godey, Éric de la Clergerie, Benoît Sagot

The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers.

Self-Supervised Learning

Paper
Add Code

Headless Language Models: Learning without Predicting with Contrastive Weight Tying

no code implementations • 15 Sep 2023 • Nathan Godey, Éric de la Clergerie, Benoît Sagot

Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies.

LAMBADA

Paper
Add Code

Is Anisotropy Inherent to Transformers?

no code implementations • 13 Jun 2023 • Nathan Godey, Éric de la Clergerie, Benoît Sagot

The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers.

Self-Supervised Learning

Paper
Add Code

MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling

no code implementations • 14 Dec 2022 • Nathan Godey, Roman Castagné, Éric de la Clergerie, Benoît Sagot

The resulting system offers a trade-off between the expressiveness of byte-level models and the speed of models trained using subword tokenization.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.