1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko
This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.
no code implementations • 14 Oct 2021 • Benjamin Muller, Luca Soldaini, Rik Koncel-Kedziorski, Eric Lind, Alessandro Moschitti
Recent approaches for question answering systems have achieved impressive performance on English by combining document-level retrieval with answer generation.
1 code implementation • EACL 2021 • Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah
Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
1 code implementation • NAACL 2021 • Benjamin Muller, Antonis Anastasopoulos, Benoît Sagot, Djamé Seddah
Focusing on the latter, we show that this failure to transfer is largely related to the impact of the script used to write such languages.
no code implementations • ACL 2020 • Djam{\'e} Seddah, Farah Essaidi, Amal Fethi, Matthieu Futeral, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Beno{\^\i}t Sagot, Abhishek Srivastava
We introduce the first treebank for a romanized user-generated content variety of Algerian, a North-African Arabic dialect known for its frequent usage of code-switching.
no code implementations • JEPTALNRECITAL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Yoann Dupont, Laurent Romary, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
L{'}utilisation pratique de ces mod{\`e}les {---} dans toutes les langues sauf l{'}anglais {---} {\'e}tait donc limit{\'e}e. La sortie r{\'e}cente de plusieurs mod{\`e}les monolingues fond{\'e}s sur BERT (Devlin et al., 2019), notamment pour le fran{\c{c}}ais, a d{\'e}montr{\'e} l{'}int{\'e}r{\^e}t de ces mod{\`e}les en am{\'e}liorant l{'}{\'e}tat de l{'}art pour toutes les t{\^a}ches {\'e}valu{\'e}es.
no code implementations • LREC 2020 • Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot
The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French.
no code implementations • 1 May 2020 • Benjamin Muller, Benoit Sagot, Djamé Seddah
Building natural language processing systems for non standardized and low resource languages is a difficult challenge.
6 code implementations • ACL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot
We show that the use of web crawled data is preferable to the use of Wikipedia data.
Ranked #1 on
Dependency Parsing
on French GSD
no code implementations • WS 2019 • Benjamin Muller, Benoit Sagot, Djam{\'e} Seddah
In this article, focusing on User Generated Content (UGC), we study the ability of BERT to perform lexical normalisation.
no code implementations • CONLL 2018 • Ganesh Jawahar, Benjamin Muller, Amal Fethi, Louis Martin, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
We augment the deep Biaffine (BiAF) parser (Dozat and Manning, 2016) with novel features to perform competitively: we utilize an indomain version of ELMo features (Peters et al., 2018) which provide context-dependent word representations; we utilize disambiguated, embedded, morphosyntactic features from lexicons (Sagot, 2018), which complements the existing feature set.