no code implementations • RANLP 2021 • Lionel Tadonfouet Tadjou, Fabrice Bourge, Tiphaine Marie, Laurent Romary, Éric de la Clergerie
In this paper we describe the process of build-ing a corporate corpus that will be used as a ref-erence for modelling and computing threadsfrom conversations generated using commu-nication and collaboration tools.
no code implementations • 11 Apr 2024 • Nathan Godey, Éric de la Clergerie, Benoît Sagot
In this paper, we find that such saturation can be explained by a mismatch between the hidden dimension of smaller models and the high rank of the target contextual probability distribution.
no code implementations • 29 Feb 2024 • Nathan Godey, Éric de la Clergerie, Benoît Sagot
Language models have long been shown to embed geographical information in their hidden representations.
no code implementations • 22 Jan 2024 • Nathan Godey, Éric de la Clergerie, Benoît Sagot
The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers.
no code implementations • 15 Sep 2023 • Nathan Godey, Éric de la Clergerie, Benoît Sagot
Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies.
no code implementations • 13 Jun 2023 • Nathan Godey, Éric de la Clergerie, Benoît Sagot
The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers.
no code implementations • 14 Dec 2022 • Nathan Godey, Roman Castagné, Éric de la Clergerie, Benoît Sagot
The resulting system offers a trade-off between the expressiveness of byte-level models and the speed of models trained using subword tokenization.
no code implementations • 18 Nov 2020 • Fuqi Song, Éric de la Clergerie
In contract analysis and contract automation, a knowledge base (KB) of legal entities is fundamental for performing tasks such as contract verification, contract generation and contract analytic.
1 code implementation • LREC 2022 • Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes, Benoît Sagot
Progress in sentence simplification has been hindered by a lack of labeled parallel simplification data, particularly in languages other than English.
Ranked #2 on Text Simplification on ASSET
2 code implementations • LREC 2020 • Louis Martin, Benoît Sagot, Éric de la Clergerie, Antoine Bordes
Text simplification aims at making a text easier to read and understand by simplifying grammar and structure while keeping the underlying information identical.
Ranked #3 on Text Simplification on ASSET