Search Results for author: Laurent Romary

Found 24 papers, 2 papers with code

Building A Corporate Corpus For Threads Constitution

no code implementations • RANLP 2021 • Lionel Tadonfouet Tadjou, Fabrice Bourge, Tiphaine Marie, Laurent Romary, Éric de la Clergerie

In this paper we describe the process of build-ing a corporate corpus that will be used as a ref-erence for modelling and computing threadsfrom conversations generated using commu-nication and collaboration tools.

Paper
Add Code

BERTrade: Using Contextual Embeddings to Parse Old French

no code implementations • LREC 2022 • Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez, Benoît Sagot, Laurent Romary, Benoit Crabbé

The successes of contextual word embeddings learned by training large-scale language models, while remarkable, have mostly occurred for languages where significant amounts of raw texts are available and where annotated data in downstream tasks have a relatively regular spelling.

Dependency Parsing POS +3

Paper
Add Code

Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units

no code implementations • 25 Mar 2024 • Biswesh Mohapatra, Seemab Hassan, Laurent Romary, Justine Cassell

We discuss our key findings during the annotation and also provide a baseline model to test the performance of current Language Models in categorizing the grounding acts of the dialogs.

Paper
Add Code

CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical Data

no code implementations • 27 Jun 2023 • Rian Touchent, Laurent Romary, Eric de la Clergerie

However, these models are trained for plain language and are less efficient on biomedical data.

Language Modelling named-entity-recognition +2

Paper
Add Code

Towards a Cleaner Document-Oriented Multilingual Crawled Corpus

no code implementations • LREC 2022 • Julien Abadji, Pedro Ortiz Suarez, Laurent Romary, Benoît Sagot

The need for raw large raw corpora has dramatically increased in recent years with the introduction of transfer learning and semi-supervised learning methods to Natural Language Processing.

Transfer Learning

Paper
Add Code

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

no code implementations • ACL 2020 • Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot

They actually equal or improve the current state of the art in tagging and parsing for all five languages.

Part-Of-Speech Tagging Word Embeddings

Paper
Add Code

Les mod\`eles de langue contextuels Camembert pour le fran\ccais : impact de la taille et de l'h\'et\'erog\'en\'eit\'e des donn\'ees d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity )

no code implementations • JEPTALNRECITAL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Yoann Dupont, Laurent Romary, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah

L{'}utilisation pratique de ces mod{\`e}les {---} dans toutes les langues sauf l{'}anglais {---} {\'e}tait donc limit{\'e}e. La sortie r{\'e}cente de plusieurs mod{\`e}les monolingues fond{\'e}s sur BERT (Devlin et al., 2019), notamment pour le fran{\c{c}}ais, a d{\'e}montr{\'e} l{'}int{\'e}r{\^e}t de ces mod{\`e}les en am{\'e}liorant l{'}{\'e}tat de l{'}art pour toutes les t{\^a}ches {\'e}valu{\'e}es.

SENTS

Paper
Add Code

Establishing a New State-of-the-Art for French Named Entity Recognition

no code implementations • LREC 2020 • Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot

The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Modelling Etymology in LMF/TEI: The Grande Dicion\'ario Houaiss da L\'\ingua Portuguesa Dictionary as a Use Case

no code implementations • LREC 2020 • Fahad Khan, Laurent Romary, Ana Salgado, Jack Bowers, Mohamed Khemakhem, Toma Tasovac

In this article we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model.

Paper
Add Code

CamemBERT: a Tasty French Language Model

6 code implementations • ACL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot

We show that the use of web crawled data is preferable to the use of Wikipedia data.

Ranked #1 on Dependency Parsing on French GSD

Dependency Parsing Language Modelling +4

126,503

Paper
Code

Automatic Identification and Normalisation of Physical Measurements in Scientific Literature

1 code implementation • Document Engineering 2019 • Luca Foppiano, Laurent Romary, Masashi Ishii, Mikiko Tanifuji

Normalised materials characteristics (such as critical temperature, pressure) extracted from scientific literature are a key resource for materials informatics (MI) [9].

NER

Paper
Code

LMF Reloaded

no code implementations • 23 May 2019 • Laurent Romary, Mohamed Khemakhem, Fahad Khan, Jack Bowers, Nicoletta Calzolari, Monte George, Mandy Pet, Piotr Bański

Lexical Markup Framework (LMF) or ISO 24613 [1] is a de jure standard that provides a framework for modelling and encoding lexical information in retrodigitised print dictionaries and NLP lexical databases.

Paper
Add Code

TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange

no code implementations • WS 2017 • Stefan Pernes, Laurent Romary

Paper
Add Code

Deep encoding of etymological information in TEI

no code implementations • 30 Nov 2016 • Jack Bowers, Laurent Romary

This paper aims to provide a comprehensive modeling and representation of etymological data in digital dictionaries.

Paper
Add Code

TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation

no code implementations • LREC 2016 • Adrien Bougouin, Sabine Barreaux, Laurent Romary, Florian Boudin, B{\'e}atrice Daille

The output keyphrases of automatic keyphrase extraction methods for test documents are typically evaluated by comparing them to manually assigned reference keyphrases.

Keyphrase Extraction

Paper
Add Code

Data fluidity in DARIAH -- pushing the agenda forward

no code implementations • 10 Mar 2016 • Laurent Romary, Mike Mertens, Anne Baillot

This paper provides both an update concerning the setting up of the European DARIAH infrastructure and a series of strong action lines related to the development of a data centred strategy for the humanities in the coming years.

Management

Paper
Add Code

Standards for language resources in ISO -- Looking back at 13 fruitful years

no code implementations • 27 Oct 2015 • Laurent Romary

This paper provides an overview of the various projects carried out within ISO committee TC 37/SC 4 dealing with the management of language (digital) resources.

Management

Paper
Add Code

Automatic Construction of a TMF Terminological Database using a Transducer Cascade

no code implementations • RANLP 2015 • Chihebeddine Ammar, Kais Haddar, Laurent Romary

Paper
Add Code

Méthodes pour la représentation informatisée de données lexicales / Methoden der Speicherung lexikalischer Daten

no code implementations • 15 May 2014 • Laurent Romary, Andreas Witt

In recent years, new developments in the area of lexicography have altered not only the management, processing and publishing of lexicographical data, but also created new types of products such as electronic dictionaries and thesauri.

Management Translation

Paper
Add Code

TBX goes TEI -- Implementing a TBX basic extension for the Text Encoding Initiative guidelines

no code implementations • 1 Mar 2014 • Laurent Romary

This paper presents an attempt to customise the TEI (Text Encoding Initiative) guidelines in order to offer the possibility to incorporate TBX (TermBase eXchange) based terminological entries within any kind of TEI documents.

Paper
Add Code

Book Review: Natural Language Processing for Historical Texts by Michael Piotrowski

no code implementations • CL 2014 • Laurent Romary

Paper
Add Code

TEI and LMF crosswalks

no code implementations • 11 Jan 2013 • Laurent Romary

The present paper explores various arguments in favour of making the Text Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) .

Paper
Add Code

Collaborative Machine Translation Service for Scientific texts

no code implementations • EACL 2012 • Patrik Lambert, Jean Senellart, Laurent Romary, Holger Schwenk, Florian Zipser, Patrice Lopez, Fr{\'e}d{\'e}ric Blain

Machine Translation Translation

Paper
Add Code

Serialising the ISO SynAF Syntactic Object Model

no code implementations • 2 Aug 2011 • Laurent Romary, Amir Zeldes, Florian Zipser

This paper introduces, an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF.

Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.