Search Results for author: Beno{\^\i}t Sagot

Found 45 papers, 4 papers with code

Les mod\`eles de langue contextuels Camembert pour le fran\ccais : impact de la taille et de l'h\'et\'erog\'en\'eit\'e des donn\'ees d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity )

no code implementations JEPTALNRECITAL 2020 Louis Martin, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Yoann Dupont, Laurent Romary, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah

L{'}utilisation pratique de ces mod{\`e}les {---} dans toutes les langues sauf l{'}anglais {---} {\'e}tait donc limit{\'e}e. La sortie r{\'e}cente de plusieurs mod{\`e}les monolingues fond{\'e}s sur BERT (Devlin et al., 2019), notamment pour le fran{\c{c}}ais, a d{\'e}montr{\'e} l{'}int{\'e}r{\^e}t de ces mod{\`e}les en am{\'e}liorant l{'}{\'e}tat de l{'}art pour toutes les t{\^a}ches {\'e}valu{\'e}es.

es-en SENTS

Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0

1 code implementation LREC 2020 Cl{\'e}mentine Fourrier, Beno{\^\i}t Sagot

Diachronic lexical information is not only important in the field of historical linguistics, but is also increasingly used in NLP, most recently for machine translation of low resource languages.

Cognate Prediction Machine Translation +1

OFrLex: A Computational Morphological and Syntactic Lexicon for Old French

no code implementations LREC 2020 Ga{\"e}l Guibon, Beno{\^\i}t Sagot

In this paper we describe our work on the development and enrichment of OFrLex, a freely available, large-coverage morphological and syntactic Old French lexicon.

Dependency Parsing Part-Of-Speech Tagging

Comparing Statistical and Neural Models for Learning Sound Correspondences

1 code implementation LREC 2020 Cl{\'e}mentine Fourrier, Beno{\^\i}t Sagot

Cognate prediction and proto-form reconstruction are key tasks in computational historical linguistics that rely on the study of sound change regularity.

Cognate Prediction Machine Translation +1

What Does BERT Learn about the Structure of Language?

1 code implementation ACL 2019 Ganesh Jawahar, Beno{\^\i}t Sagot, Djam{\'e} Seddah

BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks.

D\'eveloppement d'un lexique morphologique et syntaxique de l'ancien fran\ccais (Development of a morphological and syntactic lexicon of Old French)

no code implementations JEPTALNRECITAL 2019 Beno{\^\i}t Sagot

Nous d{\'e}crivons dans cet article notre travail de d{\'e}veloppement d{'}un lexique morphologique et syntaxique {\`a} grande {\'e}chelle de l{'}ancien fran{\c{c}}ais pour le traitement automatique des langues.

ELMoLex: Connecting ELMo and Lexicon Features for Dependency Parsing

no code implementations CONLL 2018 Ganesh Jawahar, Benjamin Muller, Amal Fethi, Louis Martin, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah

We augment the deep Biaffine (BiAF) parser (Dozat and Manning, 2016) with novel features to perform competitively: we utilize an indomain version of ELMo features (Peters et al., 2018) which provide context-dependent word representations; we utilize disambiguated, embedded, morphosyntactic features from lexicons (Sagot, 2018), which complements the existing feature set.

Dependency Parsing Language Modelling

Improving neural tagging with lexical information

no code implementations WS 2017 Beno{\^\i}t Sagot, H{\'e}ctor Mart{\'\i}nez Alonso

Neural part-of-speech tagging has achieved competitive results with the incorporation of character-based and pre-trained word embeddings.

Part-Of-Speech Tagging Word Embeddings

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

no code implementations WS 2017 G{\'e}raldine Walther, Beno{\^\i}t Sagot

In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project.

Language Acquisition Spelling Correction

\'Etiquetage multilingue en parties du discours avec MElt (Multilingual part-of-speech tagging with MElt)

no code implementations JEPTALNRECITAL 2016 Beno{\^\i}t Sagot

Nous pr{\'e}sentons des travaux r{\'e}cents r{\'e}alis{\'e}s autour de MElt, syst{\`e}me discriminant d{'}{\'e}tiquetage en parties du discours.

Part-Of-Speech Tagging

DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German

no code implementations LREC 2014 Beno{\^\i}t Sagot

We introduce DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German developed within the Alexina framework.

Morphological Analysis Morphological Inflection

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages

no code implementations LREC 2014 Yves Scherrer, Beno{\^\i}t Sagot

In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i. e., parallel) data.

Part-Of-Speech Tagging POS +1

A Language-independent Approach to Extracting Derivational Relations from an Inflectional Lexicon

no code implementations LREC 2014 Marion Baranes, Beno{\^\i}t Sagot

In this paper, we describe and evaluate an unsupervised method for acquiring pairs of lexical entries belonging to the same morphological family, i. e., derivationally related words, starting from a purely inflectional lexicon.

Morphological Analysis Question Answering

An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora

no code implementations LREC 2014 Val{\'e}rie Hanoka, Beno{\^\i}t Sagot

This paper describes YaMTG (Yet another Multilingual Translation Graph), a new open-source heavily multilingual translation database (over 664 languages represented) built using several sources, namely various wiktionaries and the OPUS parallel corpora (Tiedemann, 2009).

Translation

Evaluating and improving syntactic lexica by plugging them within a parser

no code implementations LREC 2012 Elsa Tolone, Beno{\^\i}t Sagot, {\'E}ric Villemonte de la Clergerie

We present some evaluation results for four French syntactic lexica, obtained through their conversion to the Alexina format used by the Lefff lexicon, and their integration within the large-coverage TAG-based FRMG parser.

TAG

Applying cross-lingual WSD to wordnet development

no code implementations LREC 2012 Marianna Apidianaki, Beno{\^\i}t Sagot

The automatic development of semantic resources constitutes an important challenge in the NLP community.

Word Sense Induction

Aleda, a free large-scale entity database for French

no code implementations LREC 2012 Beno{\^\i}t Sagot, Rosa Stern

For such a purpose, often refered to as entity resolution and linking, an inventory of entities is required in order to constitute a reference.

Entity Linking Entity Resolution +4

Wordnet extension made simple: A multilingual lexicon-based approach using wiki resources

no code implementations LREC 2012 Val{\'e}rie Hanoka, Beno{\^\i}t Sagot

In this paper, we propose a simple methodology for building or extending wordnets using easily extractible lexical knowledge from Wiktionary and Wikipedia.

Translation Word Sense Disambiguation

Cleaning noisy wordnets

no code implementations LREC 2012 Beno{\^\i}t Sagot, Darja Fi{\v{s}}er

Manual evaluation of the results shows that by applying a threshold similar to the estimated error rate in the respective wordnets, 67{\%} of the proposed outlier candidates are indeed incorrect for French and a 64{\%} for Slovene.

Semantic Textual Similarity Word Sense Disambiguation

Cannot find the paper you are looking for? You can Submit a new open access paper.