no code implementations • RANLP 2021 • Lionel Tadonfouet Tadjou, Fabrice Bourge, Tiphaine Marie, Laurent Romary, Éric de la Clergerie
In this paper we describe the process of build-ing a corporate corpus that will be used as a ref-erence for modelling and computing threadsfrom conversations generated using commu-nication and collaboration tools.
no code implementations • LREC 2022 • Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez, Benoît Sagot, Laurent Romary, Benoit Crabbé
The successes of contextual word embeddings learned by training large-scale language models, while remarkable, have mostly occurred for languages where significant amounts of raw texts are available and where annotated data in downstream tasks have a relatively regular spelling.
no code implementations • 25 Mar 2024 • Biswesh Mohapatra, Seemab Hassan, Laurent Romary, Justine Cassell
We discuss our key findings during the annotation and also provide a baseline model to test the performance of current Language Models in categorizing the grounding acts of the dialogs.
no code implementations • 27 Jun 2023 • Rian Touchent, Laurent Romary, Eric de la Clergerie
However, these models are trained for plain language and are less efficient on biomedical data.
no code implementations • LREC 2022 • Julien Abadji, Pedro Ortiz Suarez, Laurent Romary, Benoît Sagot
The need for raw large raw corpora has dramatically increased in recent years with the introduction of transfer learning and semi-supervised learning methods to Natural Language Processing.
no code implementations • ACL 2020 • Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot
They actually equal or improve the current state of the art in tagging and parsing for all five languages.
no code implementations • JEPTALNRECITAL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Yoann Dupont, Laurent Romary, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
L{'}utilisation pratique de ces mod{\`e}les {---} dans toutes les langues sauf l{'}anglais {---} {\'e}tait donc limit{\'e}e. La sortie r{\'e}cente de plusieurs mod{\`e}les monolingues fond{\'e}s sur BERT (Devlin et al., 2019), notamment pour le fran{\c{c}}ais, a d{\'e}montr{\'e} l{'}int{\'e}r{\^e}t de ces mod{\`e}les en am{\'e}liorant l{'}{\'e}tat de l{'}art pour toutes les t{\^a}ches {\'e}valu{\'e}es.
no code implementations • LREC 2020 • Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot
The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French.
no code implementations • LREC 2020 • Fahad Khan, Laurent Romary, Ana Salgado, Jack Bowers, Mohamed Khemakhem, Toma Tasovac
In this article we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model.
6 code implementations • ACL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot
We show that the use of web crawled data is preferable to the use of Wikipedia data.
Ranked #1 on Dependency Parsing on French GSD
1 code implementation • Document Engineering 2019 • Luca Foppiano, Laurent Romary, Masashi Ishii, Mikiko Tanifuji
Normalised materials characteristics (such as critical temperature, pressure) extracted from scientific literature are a key resource for materials informatics (MI) [9].
no code implementations • 23 May 2019 • Laurent Romary, Mohamed Khemakhem, Fahad Khan, Jack Bowers, Nicoletta Calzolari, Monte George, Mandy Pet, Piotr Bański
Lexical Markup Framework (LMF) or ISO 24613 [1] is a de jure standard that provides a framework for modelling and encoding lexical information in retrodigitised print dictionaries and NLP lexical databases.
no code implementations • 30 Nov 2016 • Jack Bowers, Laurent Romary
This paper aims to provide a comprehensive modeling and representation of etymological data in digital dictionaries.
no code implementations • LREC 2016 • Adrien Bougouin, Sabine Barreaux, Laurent Romary, Florian Boudin, B{\'e}atrice Daille
The output keyphrases of automatic keyphrase extraction methods for test documents are typically evaluated by comparing them to manually assigned reference keyphrases.
no code implementations • 10 Mar 2016 • Laurent Romary, Mike Mertens, Anne Baillot
This paper provides both an update concerning the setting up of the European DARIAH infrastructure and a series of strong action lines related to the development of a data centred strategy for the humanities in the coming years.
no code implementations • 27 Oct 2015 • Laurent Romary
This paper provides an overview of the various projects carried out within ISO committee TC 37/SC 4 dealing with the management of language (digital) resources.
no code implementations • 15 May 2014 • Laurent Romary, Andreas Witt
In recent years, new developments in the area of lexicography have altered not only the management, processing and publishing of lexicographical data, but also created new types of products such as electronic dictionaries and thesauri.
no code implementations • 1 Mar 2014 • Laurent Romary
This paper presents an attempt to customise the TEI (Text Encoding Initiative) guidelines in order to offer the possibility to incorporate TBX (TermBase eXchange) based terminological entries within any kind of TEI documents.
no code implementations • 11 Jan 2013 • Laurent Romary
The present paper explores various arguments in favour of making the Text Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) .
no code implementations • 2 Aug 2011 • Laurent Romary, Amir Zeldes, Florian Zipser
This paper introduces, an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF.