no code implementations • LREC 2022 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum
BERT models used in specialized domains all seem to be the result of a simple strategy: initializing with the original BERT and then resuming pre-training on a specialized corpus.
no code implementations • IWSLT 2016 • Franck Burlot, Elena Knyazeva, Thomas Lavergne, François Yvon
This paper describes a two-step machine translation system that addresses the issue of translating into a morphologically rich language (English to Czech), by performing separately the translation and the generation of target morphology.
no code implementations • IWSLT 2016 • Franck Burlot, Matthieu Labeau, Elena Knyazeva, Thomas Lavergne, Alexandre Allauzen, François Yvon
This paper describes LIMSI’s submission to the MT track of IWSLT 2016.
2 code implementations • 27 Mar 2024 • Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum
User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world.
no code implementations • LREC 2022 • Hui-Syuan Yeh, Thomas Lavergne, Pierre Zweigenbaum
In this paper, we investigate prompting for biomedical relation extraction, with experiments on the ChemProt dataset.
2 code implementations • COLING 2020 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum, Junichi Tsujii
Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers.
Ranked #1 on
Semantic Similarity
on ClinicalSTS
Clinical Concept Extraction
Drug–drug Interaction Extraction
+3
no code implementations • LREC 2020 • Arnaud Ferr{\'e}, Robert Bossy, Mouhamadou Ba, Louise Del{\'e}ger, Thomas Lavergne, Pierre Zweigenbaum, Claire N{\'e}dellec
We propose a new approach to address the scarcity of training data that extends the CONTES method by corpus selection, pre-processing and weak supervision strategies, which can yield high-performance results without any manually annotated examples.
1 code implementation • ACL 2019 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum
Using pre-trained word embeddings in conjunction with Deep Learning models has become the {``}de facto{''} approach in Natural Language Processing (NLP).
Ranked #4 on
Clinical Concept Extraction
on 2010 i2b2/VA
2 code implementations • 30 May 2019 • Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski
We provide a preliminary analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.
no code implementations • JEPTALNRECITAL 2018 • Rachel Bawden, Thomas Lavergne, Sophie Rosset
In this article, we provide several approaches to the automatic identification of parallel sentences that require sentence-external linguistic context to be correctly translated.
no code implementations • LREC 2018 • Delphine Bernhard, Anne-Laure Ligozat, Fanny Martin, Myriam Bras, Pierre Magistry, Marianne Vergez-Couret, Lucie Steibl{\'e}, Pascale Erhart, Nabil Hathout, Dominique Huck, Christophe Rey, Philippe Reyn{\'e}s, Sophie Rosset, Jean Sibille, Thomas Lavergne
no code implementations • EMNLP 2017 • Thomas Lavergne, Fran{\c{c}}ois Yvon
The computational complexity of linear-chain Conditional Random Fields (CRFs) makes it difficult to deal with very large label sets and long range dependencies.
no code implementations • JEPTALNRECITAL 2017 • Pierre Zweigenbaum, Thomas Lavergne
Nous faisons l{'}hypoth{\`e}se qu{'}une annotation {\`a} un niveau de granularit{\'e} plus fin, typiquement au niveau de l{'}{\'e}nonc{\'e}, devrait am{\'e}liorer la performance d{'}un d{\'e}tecteur automatique entra{\^\i}n{\'e} sur ces donn{\'e}es.
no code implementations • JEPTALNRECITAL 2017 • Christopher Norman, Cyril Grouin, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Pierre Zweigenbaum
Nous proposons des d{\'e}monstrations de trois outils d{\'e}velopp{\'e}s par le LIMSI en traitement automatique des langues appliqu{\'e} au domaine biom{\'e}dical : la d{\'e}tection de concepts m{\'e}dicaux dans des textes courts, la cat{\'e}gorisation d{'}articles scientifiques pour l{'}assistance {\`a} l{'}{\'e}criture de revues syst{\'e}matiques, et l{'}anonymisation de textes cliniques.
no code implementations • WS 2016 • Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Aude Robert, Cyril Grouin, Gr{\'e}goire Rey, Pierre Zweigenbaum
Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English.
no code implementations • WS 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne
In some plain text documents, end-of-line marks may or may not mark the boundary of a text unit (e. g., of a paragraph).
no code implementations • WS 2016 • Jan-Thorsten Peter, Tamer Alkhouli, Hermann Ney, Matthias Huck, Fabienne Braune, Alex Fraser, er, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Barry Haddow, Rico Sennrich, Fr{\'e}d{\'e}ric Blain, Lucia Specia, Jan Niehues, Alex Waibel, Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Elena Knyazeva, Thomas Lavergne, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Stella Frank
Ranked #12 on
Machine Translation
on WMT2016 English-Romanian
no code implementations • JEPTALNRECITAL 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne
Nous proposons une m{\'e}thode enti{\`e}rement non-supervis{\'e}e pour d{\'e}terminer si une fin de ligne doit {\^e}tre vue comme un simple espace ou comme une v{\'e}ritable fronti{\`e}re d{'}unit{\'e} textuelle, et la testons sur un corpus de comptes rendus m{\'e}dicaux.
no code implementations • JEPTALNRECITAL 2015 • Nicolas P{\'e}cheux, Alex Allauzen, re, Thomas Lavergne, Guillaume Wisniewski, Fran{\c{c}}ois Yvon
Quand on dispose de connaissances a priori sur les sorties possibles d{'}un probl{\`e}me d{'}{\'e}tiquetage, il semble souhaitable d{'}inclure cette information lors de l{'}apprentissage pour simplifier la t{\^a}che de mod{\'e}lisation et acc{\'e}l{\'e}rer les traitements.
no code implementations • JEPTALNRECITAL 2015 • Christelle Rabary, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol
En perspective de ce travail, nous envisageons une application du corpus clinique annot{\'e} pour am{\'e}liorer l{'}{\'e}tiquetage morpho-syntaxique des documents cliniques en fran{\c{c}}ais.
no code implementations • LREC 2014 • Thomas Lavergne, Gilles Adda, Martine Adda-Decker, Lori Lamel
This language identification system was used to select textual data extracted from the web, in order to build a lexicon and language models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • JEPTALNRECITAL 2012 • Souhir Gahbiche-Braham, H{\'e}l{\`e}ne Bonneau-Maynard, Thomas Lavergne, Fran{\c{c}}ois Yvon
no code implementations • LREC 2012 • Souhir Gahbiche-Braham, H{\'e}l{\`e}ne Bonneau-Maynard, Thomas Lavergne, Fran{\c{c}}ois Yvon
Arabic is a morphologically rich language, and Arabic texts abound of complex word forms built by concatenation of multiple subparts, corresponding for instance to prepositions, articles, roots prefixes, or suffixes.