Search Results for author: Thomas Lavergne

Found 35 papers, 4 papers with code

Two-Step MT: Predicting Target Morphology

no code implementations • IWSLT 2016 • Franck Burlot, Elena Knyazeva, Thomas Lavergne, François Yvon

This paper describes a two-step machine translation system that addresses the issue of translating into a morphologically rich language (English to Czech), by performing separately the translation and the generation of target morphology.

Machine Translation Translation +1

Paper
Add Code

LIMSI@IWSLT’16: MT Track

no code implementations • IWSLT 2016 • Franck Burlot, Matthieu Labeau, Elena Knyazeva, Thomas Lavergne, Alexandre Allauzen, François Yvon

This paper describes LIMSI’s submission to the MT track of IWSLT 2016.

Language Modelling Translation

Paper
Add Code

Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain

no code implementations • LREC 2022 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum

BERT models used in specialized domains all seem to be the result of a simple strategy: initializing with the original BERT and then resuming pre-training on a specialized corpus.

Paper
Add Code

Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing

no code implementations • EMNLP (Eval4NLP) 2021 • Lucie Gianola, Hicham El Boukkouri, Cyril Grouin, Thomas Lavergne, Patrick Paroubek, Pierre Zweigenbaum

Paper
Add Code

A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages

2 code implementations • 27 Mar 2024 • Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum

User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world.

Attribute

Paper
Code

Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction

no code implementations • LREC 2022 • Hui-Syuan Yeh, Thomas Lavergne, Pierre Zweigenbaum

In this paper, we investigate prompting for biomedical relation extraction, with experiments on the ChemProt dataset.

Cloze Test Relation +1

Paper
Add Code

CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

2 code implementations • COLING 2020 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum, Junichi Tsujii

Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers.

Ranked #1 on Semantic Similarity on ClinicalSTS

Clinical Concept Extraction Drug–drug Interaction Extraction +3

194

Paper
Code

Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological Information

no code implementations • LREC 2020 • Arnaud Ferr{\'e}, Robert Bossy, Mouhamadou Ba, Louise Del{\'e}ger, Thomas Lavergne, Pierre Zweigenbaum, Claire N{\'e}dellec

We propose a new approach to address the scarcity of training data that extends the CONTES method by corpus selection, pre-processing and weak supervision strategies, which can yield high-performance results without any manually annotated examples.

BIG-bench Machine Learning Entity Linking

Paper
Add Code

Embedding Strategies for Specialized Domains: Application to Clinical Entity Recognition

1 code implementation • ACL 2019 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum

Using pre-trained word embeddings in conjunction with Deep Learning models has become the {``}de facto{''} approach in Natural Language Processing (NLP).

Ranked #4 on Clinical Concept Extraction on 2010 i2b2/VA

Clinical Concept Extraction Word Embeddings

Paper
Code

DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation

2 code implementations • 30 May 2019 • Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski

We provide a preliminary analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.

Machine Translation Sentence +1

Paper
Code

Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard

no code implementations • LREC 2018 • Delphine Bernhard, Anne-Laure Ligozat, Fanny Martin, Myriam Bras, Pierre Magistry, Marianne Vergez-Couret, Lucie Steibl{\'e}, Pascale Erhart, Nabil Hathout, Dominique Huck, Christophe Rey, Philippe Reyn{\'e}s, Sophie Rosset, Jean Sibille, Thomas Lavergne

Paper
Add Code

Detecting context-dependent sentences in parallel corpora

no code implementations • JEPTALNRECITAL 2018 • Rachel Bawden, Thomas Lavergne, Sophie Rosset

In this article, we provide several approaches to the automatic identification of parallel sentences that require sentence-external linguistic context to be correctly translated.

Machine Translation Sentence +1

Paper
Add Code

Learning the Structure of Variable-Order CRFs: a finite-state perspective

no code implementations • EMNLP 2017 • Thomas Lavergne, Fran{\c{c}}ois Yvon

The computational complexity of linear-chain Conditional Random Fields (CRFs) makes it difficult to deal with very large label sets and long range dependencies.

Chunking feature selection +2

Paper
Add Code

Traitement automatique de la langue biom\'edicale au LIMSI (Biomedical language processing at LIMSI)

no code implementations • JEPTALNRECITAL 2017 • Christopher Norman, Cyril Grouin, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Pierre Zweigenbaum

Nous proposons des d{\'e}monstrations de trois outils d{\'e}velopp{\'e}s par le LIMSI en traitement automatique des langues appliqu{\'e} au domaine biom{\'e}dical : la d{\'e}tection de concepts m{\'e}dicaux dans des textes courts, la cat{\'e}gorisation d{'}articles scientifiques pour l{'}assistance {\`a} l{'}{\'e}criture de revues syst{\'e}matiques, et l{'}anonymisation de textes cliniques.

Paper
Add Code

D\'etection de concepts et granularit\'e de l'annotation (Concept detection and annotation granularity )

no code implementations • JEPTALNRECITAL 2017 • Pierre Zweigenbaum, Thomas Lavergne

Nous faisons l{'}hypoth{\`e}se qu{'}une annotation {\`a} un niveau de granularit{\'e} plus fin, typiquement au niveau de l{'}{\'e}nonc{\'e}, devrait am{\'e}liorer la performance d{'}un d{\'e}tecteur automatique entra{\^\i}n{\'e} sur ces donn{\'e}es.

Paper
Add Code

Supervised classification of end-of-lines in clinical text with no manual annotation

no code implementations • WS 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne

In some plain text documents, end-of-line marks may or may not mark the boundary of a text unit (e. g., of a paragraph).

General Classification Language Modelling +1

Paper
Add Code

A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage

no code implementations • WS 2016 • Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Aude Robert, Cyril Grouin, Gr{\'e}goire Rey, Pierre Zweigenbaum

Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English.

Named Entity Recognition (NER)

Paper
Add Code

Hybrid methods for ICD-10 coding of death certificates

no code implementations • WS 2016 • Pierre Zweigenbaum, Thomas Lavergne

Paper
Add Code

The QT21/HimL Combined Machine Translation System

no code implementations • WS 2016 • Jan-Thorsten Peter, Tamer Alkhouli, Hermann Ney, Matthias Huck, Fabienne Braune, Alex Fraser, er, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Barry Haddow, Rico Sennrich, Fr{\'e}d{\'e}ric Blain, Lucia Specia, Jan Niehues, Alex Waibel, Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Elena Knyazeva, Thomas Lavergne, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Stella Frank

Ranked #12 on Machine Translation on WMT2016 English-Romanian

Machine Translation Translation

Paper
Add Code

LIMSI@WMT'16: Machine Translation of News

no code implementations • WS 2016 • Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Oph{\'e}lie Lacroix, Elena Knyazeva, Thomas Lavergne, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Machine Translation Translation

Paper
Add Code

Une cat\'egorisation de fins de lignes non-supervis\'ee (End-of-line classification with no supervision)

no code implementations • JEPTALNRECITAL 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne

Nous proposons une m{\'e}thode enti{\`e}rement non-supervis{\'e}e pour d{\'e}terminer si une fin de ligne doit {\^e}tre vue comme un simple espace ou comme une v{\'e}ritable fronti{\`e}re d{'}unit{\'e} textuelle, et la testons sur un corpus de comptes rendus m{\'e}dicaux.

Paper
Add Code

LIMSI@WMT'15 : Translation Task

no code implementations • WS 2015 • Benjamin Marie, Alex Allauzen, re, Franck Burlot, Quoc-Khanh Do, Julia Ive, Elena Knyazeva, Matthieu Labeau, Thomas Lavergne, Kevin L{\"o}ser, Nicolas P{\'e}cheux, Fran{\c{c}}ois Yvon

Domain Adaptation Machine Translation +1

Paper
Add Code

Etiquetage morpho-syntaxique en domaine de sp\'ecialit\'e: le domaine m\'edical

no code implementations • JEPTALNRECITAL 2015 • Christelle Rabary, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol

En perspective de ce travail, nous envisageons une application du corpus clinique annot{\'e} pour am{\'e}liorer l{'}{\'e}tiquetage morpho-syntaxique des documents cliniques en fran{\c{c}}ais.

Paper
Add Code

Oublier ce qu'on sait, pour mieux apprendre ce qu'on ne sait pas : une \'etude sur les contraintes de type dans les mod\`eles CRF

no code implementations • JEPTALNRECITAL 2015 • Nicolas P{\'e}cheux, Alex Allauzen, re, Thomas Lavergne, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Quand on dispose de connaissances a priori sur les sorties possibles d{'}un probl{\`e}me d{'}{\'e}tiquetage, il semble souhaitable d{'}inclure cette information lors de l{'}apprentissage pour simplifier la t{\^a}che de mod{\'e}lisation et acc{\'e}l{\'e}rer les traitements.

Paper
Add Code

Optimizing annotation efforts to build reliable annotated corpora for training statistical models

no code implementations • WS 2014 • Cyril Grouin, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol

Active Learning Named Entity Recognition (NER) +1

Paper
Add Code

LIMSI @ WMT'14 Medical Translation Task

no code implementations • WS 2014 • Nicolas P{\'e}cheux, Li Gong, Quoc Khanh Do, Benjamin Marie, Yulia Ivanishcheva, Alex Allauzen, er, Thomas Lavergne, Jan Niehues, Aur{\'e}lien Max, Fran{\c{c}}ois Yvon

Language Modelling Machine Translation +1

Paper
Add Code

Automatic language identity tagging on word and sentence-level in multilingual text sources: a case-study on Luxembourgish

no code implementations • LREC 2014 • Thomas Lavergne, Gilles Adda, Martine Adda-Decker, Lori Lamel

This language identification system was used to select textual data extracted from the web, in order to build a lexicon and language models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Automatic Named Entity Pre-annotation for Out-of-domain Human Annotation

no code implementations • WS 2013 • Sophie Rosset, Cyril Grouin, Thomas Lavergne, Mohamed Ben Jannet, J{\'e}r{\'e}my Leixa, Olivier Galibert, Pierre Zweigenbaum

Paper
Add Code

LIMSI @ WMT13

no code implementations • WS 2013 • Alex Allauzen, er, Nicolas P{\'e}cheux, Quoc Khanh Do, Marco Dinarelli, Thomas Lavergne, Aur{\'e}lien Max, Hai-Son Le, Fran{\c{c}}ois Yvon

Language Modelling Machine Translation +1

Paper
Add Code

LIMSI's participation to the 2013 shared task on Native Language Identification

no code implementations • WS 2013 • Thomas Lavergne, Gabriel Illouz, Aur{\'e}lien Max, Ryo Nagata

Native Language Identification

Paper
Add Code

A fully discriminative training framework for Statistical Machine Translation (Un cadre d'apprentissage int\'egralement discriminant pour la traduction statistique) [in French]

no code implementations • JEPTALNRECITAL 2013 • Thomas Lavergne, Alex Allauzen, re, Fran{\c{c}}ois Yvon

Machine Translation

Paper
Add Code

Rep\'erage des entit\'es nomm\'ees pour l'arabe : adaptation non-supervis\'ee et combinaison de syst\`emes (Named Entity Recognition for Arabic : Unsupervised adaptation and Systems combination) [in French]

no code implementations • JEPTALNRECITAL 2012 • Souhir Gahbiche-Braham, H{\'e}l{\`e}ne Bonneau-Maynard, Thomas Lavergne, Fran{\c{c}}ois Yvon

Feature Engineering Machine Translation +4

Paper
Add Code

LIMSI @ WMT12

no code implementations • WS 2012 • Hai-Son Le, Thomas Lavergne, Alex Allauzen, re, Marianna Apidianaki, Li Gong, Aur{\'e}lien Max, Artem Sokolov, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Machine Translation Word Sense Disambiguation

Paper
Add Code

Joint WMT 2012 Submission of the QUAERO Project

no code implementations • WS 2012 • Markus Freitag, Stephan Peitz, Matthias Huck, Hermann Ney, Jan Niehues, Teresa Herrmann, Alex Waibel, Hai-Son Le, Thomas Lavergne, Alex Allauzen, re, Bianka Buschbeck, Josep Maria Crego, Jean Senellart

Machine Translation

Paper
Add Code

Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier

no code implementations • LREC 2012 • Souhir Gahbiche-Braham, H{\'e}l{\`e}ne Bonneau-Maynard, Thomas Lavergne, Fran{\c{c}}ois Yvon

Arabic is a morphologically rich language, and Arabic texts abound of complex word forms built by concatenation of multiple subparts, corresponding for instance to prepositions, articles, roots prefixes, or suffixes.

BIG-bench Machine Learning Machine Translation +5

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.