no code implementations • ACL 2020 • Djam{\'e} Seddah, Farah Essaidi, Amal Fethi, Matthieu Futeral, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Beno{\^\i}t Sagot, Abhishek Srivastava
We introduce the first treebank for a romanized user-generated content variety of Algerian, a North-African Arabic dialect known for its frequent usage of code-switching.
no code implementations • JEPTALNRECITAL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Yoann Dupont, Laurent Romary, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
L{'}utilisation pratique de ces mod{\`e}les {---} dans toutes les langues sauf l{'}anglais {---} {\'e}tait donc limit{\'e}e. La sortie r{\'e}cente de plusieurs mod{\`e}les monolingues fond{\'e}s sur BERT (Devlin et al., 2019), notamment pour le fran{\c{c}}ais, a d{\'e}montr{\'e} l{'}int{\'e}r{\^e}t de ces mod{\`e}les en am{\'e}liorant l{'}{\'e}tat de l{'}art pour toutes les t{\^a}ches {\'e}valu{\'e}es.
1 code implementation • LREC 2020 • Cl{\'e}mentine Fourrier, Beno{\^\i}t Sagot
Diachronic lexical information is not only important in the field of historical linguistics, but is also increasingly used in NLP, most recently for machine translation of low resource languages.
no code implementations • LREC 2020 • Ga{\"e}l Guibon, Beno{\^\i}t Sagot
In this paper we describe our work on the development and enrichment of OFrLex, a freely available, large-coverage morphological and syntactic Old French lexicon.
no code implementations • LREC 2020 • Murielle Popa-Fabre, Pedro Javier Ortiz Su{\'a}rez, Beno{\^\i}t Sagot, {\'E}ric de la Clergerie
This paper investigates the impact of different types and size of training corpora on language models.
1 code implementation • LREC 2020 • Cl{\'e}mentine Fourrier, Beno{\^\i}t Sagot
Cognate prediction and proto-form reconstruction are key tasks in computational historical linguistics that rely on the study of sound change regularity.
1 code implementation • ACL 2019 • Ganesh Jawahar, Beno{\^\i}t Sagot, Djam{\'e} Seddah
BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks.
no code implementations • JEPTALNRECITAL 2019 • Beno{\^\i}t Sagot
Nous d{\'e}crivons dans cet article notre travail de d{\'e}veloppement d{'}un lexique morphologique et syntaxique {\`a} grande {\'e}chelle de l{'}ancien fran{\c{c}}ais pour le traitement automatique des langues.
no code implementations • CONLL 2018 • Ganesh Jawahar, Benjamin Muller, Amal Fethi, Louis Martin, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
We augment the deep Biaffine (BiAF) parser (Dozat and Manning, 2016) with novel features to perform competitively: we utilize an indomain version of ELMo features (Peters et al., 2018) which provide context-dependent word representations; we utilize disambiguated, embedded, morphosyntactic features from lexicons (Sagot, 2018), which complements the existing feature set.
no code implementations • WS 2017 • Beno{\^\i}t Sagot, H{\'e}ctor Mart{\'\i}nez Alonso
Neural part-of-speech tagging has achieved competitive results with the incorporation of character-based and pre-trained word embeddings.
no code implementations • CONLL 2017 • {\'E}ric de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
We present the ParisNLP entry at the UD CoNLL 2017 parsing shared task.
no code implementations • WS 2017 • G{\'e}raldine Walther, Beno{\^\i}t Sagot
In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project.
no code implementations • JEPTALNRECITAL 2017 • Beno{\^\i}t Sagot
Les ressources lexicales {\'e}lectroniques ne contiennent quasiment jamais d{'}informations {\'e}tymologiques.
1 code implementation • WS 2017 • H{\'e}ctor Mart{\'\i}nez Alonso, Amaury Delamaire, Beno{\^\i}t Sagot
We focus on the identification of omission in statement pairs.
no code implementations • WS 2016 • H{\'e}ctor Mart{\'\i}nez Alonso, Djam{\'e} Seddah, Beno{\^\i}t Sagot
User-generated content presents many challenges for its automatic processing.
no code implementations • JEPTALNRECITAL 2016 • Beno{\^\i}t Sagot
Nous pr{\'e}sentons des travaux r{\'e}cents r{\'e}alis{\'e}s autour de MElt, syst{\`e}me discriminant d{'}{\'e}tiquetage en parties du discours.
no code implementations • LREC 2014 • Beno{\^\i}t Sagot
We introduce DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German developed within the Alexina framework.
no code implementations • LREC 2014 • Yves Scherrer, Beno{\^\i}t Sagot
In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i. e., parallel) data.
no code implementations • LREC 2014 • Marion Baranes, Beno{\^\i}t Sagot
In this paper, we describe and evaluate an unsupervised method for acquiring pairs of lexical entries belonging to the same morphological family, i. e., derivationally related words, starting from a purely inflectional lexicon.
no code implementations • LREC 2014 • C, Marie ito, Pascal Amsili, Lucie Barque, Farah Benamara, Ga{\"e}l de Chalendar, Marianne Djemaa, Pauline Haas, Richard Huyghe, Yvette Yannick Mathieu, Philippe Muller, Beno{\^\i}t Sagot, Laure Vieu
The Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis.
no code implementations • LREC 2014 • Val{\'e}rie Hanoka, Beno{\^\i}t Sagot
This paper describes YaMTG (Yet another Multilingual Translation Graph), a new open-source heavily multilingual translation database (over 664 languages represented) built using several sources, namely various wiktionaries and the OPUS parallel corpora (Tiedemann, 2009).
no code implementations • LREC 2012 • Elsa Tolone, Beno{\^\i}t Sagot, {\'E}ric Villemonte de la Clergerie
We present some evaluation results for four French syntactic lexica, obtained through their conversion to the Alexina format used by the Lefff lexicon, and their integration within the large-coverage TAG-based FRMG parser.
no code implementations • LREC 2012 • Marianna Apidianaki, Beno{\^\i}t Sagot
The automatic development of semantic resources constitutes an important challenge in the NLP community.
no code implementations • LREC 2012 • Beno{\^\i}t Sagot, Rosa Stern
For such a purpose, often refered to as entity resolution and linking, an inventory of entities is required in order to constitute a reference.
no code implementations • LREC 2012 • Kata G{\'a}bor, Marianna Apidianaki, Beno{\^\i}t Sagot, {\'E}ric Villemonte de la Clergerie
In this article, we present a distributional analysis method for extracting nominalization relations from monolingual corpora.
no code implementations • LREC 2012 • Val{\'e}rie Hanoka, Beno{\^\i}t Sagot
In this paper, we propose a simple methodology for building or extending wordnets using easily extractible lexical knowledge from Wiktionary and Wikipedia.
no code implementations • LREC 2012 • Beno{\^\i}t Sagot, Darja Fi{\v{s}}er
Manual evaluation of the results shows that by applying a threshold similar to the estimated error rate in the respective wordnets, 67{\%} of the proposed outlier candidates are indeed incorrect for French and a 64{\%} for Slovene.