Search Results for author: Thomas Lavergne

Found 35 papers, 4 papers with code

Two-Step MT: Predicting Target Morphology

no code implementations IWSLT 2016 Franck Burlot, Elena Knyazeva, Thomas Lavergne, François Yvon

This paper describes a two-step machine translation system that addresses the issue of translating into a morphologically rich language (English to Czech), by performing separately the translation and the generation of target morphology.

Machine Translation Translation +1

Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain

no code implementations LREC 2022 Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum

BERT models used in specialized domains all seem to be the result of a simple strategy: initializing with the original BERT and then resuming pre-training on a specialized corpus.

CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

2 code implementations COLING 2020 Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum, Junichi Tsujii

Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers.

Clinical Concept Extraction Drug–drug Interaction Extraction +3

Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological Information

no code implementations LREC 2020 Arnaud Ferr{\'e}, Robert Bossy, Mouhamadou Ba, Louise Del{\'e}ger, Thomas Lavergne, Pierre Zweigenbaum, Claire N{\'e}dellec

We propose a new approach to address the scarcity of training data that extends the CONTES method by corpus selection, pre-processing and weak supervision strategies, which can yield high-performance results without any manually annotated examples.

BIG-bench Machine Learning Entity Linking

DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation

2 code implementations30 May 2019 Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski

We provide a preliminary analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.

Machine Translation Sentence +1

Detecting context-dependent sentences in parallel corpora

no code implementations JEPTALNRECITAL 2018 Rachel Bawden, Thomas Lavergne, Sophie Rosset

In this article, we provide several approaches to the automatic identification of parallel sentences that require sentence-external linguistic context to be correctly translated.

Machine Translation Sentence +1

Learning the Structure of Variable-Order CRFs: a finite-state perspective

no code implementations EMNLP 2017 Thomas Lavergne, Fran{\c{c}}ois Yvon

The computational complexity of linear-chain Conditional Random Fields (CRFs) makes it difficult to deal with very large label sets and long range dependencies.

Chunking feature selection +2

Traitement automatique de la langue biom\'edicale au LIMSI (Biomedical language processing at LIMSI)

no code implementations JEPTALNRECITAL 2017 Christopher Norman, Cyril Grouin, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Pierre Zweigenbaum

Nous proposons des d{\'e}monstrations de trois outils d{\'e}velopp{\'e}s par le LIMSI en traitement automatique des langues appliqu{\'e} au domaine biom{\'e}dical : la d{\'e}tection de concepts m{\'e}dicaux dans des textes courts, la cat{\'e}gorisation d{'}articles scientifiques pour l{'}assistance {\`a} l{'}{\'e}criture de revues syst{\'e}matiques, et l{'}anonymisation de textes cliniques.

D\'etection de concepts et granularit\'e de l'annotation (Concept detection and annotation granularity )

no code implementations JEPTALNRECITAL 2017 Pierre Zweigenbaum, Thomas Lavergne

Nous faisons l{'}hypoth{\`e}se qu{'}une annotation {\`a} un niveau de granularit{\'e} plus fin, typiquement au niveau de l{'}{\'e}nonc{\'e}, devrait am{\'e}liorer la performance d{'}un d{\'e}tecteur automatique entra{\^\i}n{\'e} sur ces donn{\'e}es.

A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage

no code implementations WS 2016 Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Aude Robert, Cyril Grouin, Gr{\'e}goire Rey, Pierre Zweigenbaum

Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English.

Named Entity Recognition (NER)

Une cat\'egorisation de fins de lignes non-supervis\'ee (End-of-line classification with no supervision)

no code implementations JEPTALNRECITAL 2016 Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne

Nous proposons une m{\'e}thode enti{\`e}rement non-supervis{\'e}e pour d{\'e}terminer si une fin de ligne doit {\^e}tre vue comme un simple espace ou comme une v{\'e}ritable fronti{\`e}re d{'}unit{\'e} textuelle, et la testons sur un corpus de comptes rendus m{\'e}dicaux.

Etiquetage morpho-syntaxique en domaine de sp\'ecialit\'e: le domaine m\'edical

no code implementations JEPTALNRECITAL 2015 Christelle Rabary, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol

En perspective de ce travail, nous envisageons une application du corpus clinique annot{\'e} pour am{\'e}liorer l{'}{\'e}tiquetage morpho-syntaxique des documents cliniques en fran{\c{c}}ais.

Oublier ce qu'on sait, pour mieux apprendre ce qu'on ne sait pas : une \'etude sur les contraintes de type dans les mod\`eles CRF

no code implementations JEPTALNRECITAL 2015 Nicolas P{\'e}cheux, Alex Allauzen, re, Thomas Lavergne, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Quand on dispose de connaissances a priori sur les sorties possibles d{'}un probl{\`e}me d{'}{\'e}tiquetage, il semble souhaitable d{'}inclure cette information lors de l{'}apprentissage pour simplifier la t{\^a}che de mod{\'e}lisation et acc{\'e}l{\'e}rer les traitements.

Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier

no code implementations LREC 2012 Souhir Gahbiche-Braham, H{\'e}l{\`e}ne Bonneau-Maynard, Thomas Lavergne, Fran{\c{c}}ois Yvon

Arabic is a morphologically rich language, and Arabic texts abound of complex word forms built by concatenation of multiple subparts, corresponding for instance to prepositions, articles, roots prefixes, or suffixes.

BIG-bench Machine Learning Machine Translation +5

Cannot find the paper you are looking for? You can Submit a new open access paper.