no code implementations • COLING (LaTeCHCLfL, CLFL, LaTeCH) 2020 • Alex Zhai, Zheng Zhang, Amel Fraisse, Ronald Jenn, Shelley Fisher Fishkin, Pierre Zweigenbaum
TL-Explorer is a digital humanities tool for mapping and analyzing translated literature, encompassing the World Map and the Translation Dashboard.
no code implementations • EAMT 2020 • Ēriks Ajausks, Victoria Arranz, Laurent Bié, Aleix Cerdà-i-Cucó, Khalid Choukri, Montse Cuadros, Hans Degroote, Amando Estela, Thierry Etchegoyhen, Mercedes García-Martínez, Aitor García-Pablos, Manuel Herranz, Alejandro Kohan, Maite Melero, Mike Rosner, Roberts Rozis, Patrick Paroubek, Artūrs Vasiļevskis, Pierre Zweigenbaum
We describe the MAPA project, funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages.
no code implementations • LREC 2022 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum
BERT models used in specialized domains all seem to be the result of a simple strategy: initializing with the original BERT and then resuming pre-training on a specialized corpus.
1 code implementation • LREC 2022 • Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian Möller, Pierre Zweigenbaum
In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content.
no code implementations • LREC 2022 • Omar Adjali, Emmanuel Morin, Pierre Zweigenbaum
To that aim, we exploit parallel corpora to perform automatic bilingual MWT extraction and comparable corpus construction.
no code implementations • LEGAL (LREC) 2022 • Victoria Arranz, Khalid Choukri, Montse Cuadros, Aitor García Pablos, Lucie Gianola, Cyril Grouin, Manuel Herranz, Patrick Paroubek, Pierre Zweigenbaum
This paper presents the outcomes of the MAPA project, a set of annotated corpora for 24 languages of the European Union and an open-source customisable toolkit able to detect and substitute sensitive information in text documents from any domain, using state-of-the art, deep learning-based named entity recognition techniques.
no code implementations • 5 Apr 2024 • Mathilde Aguiar, Pierre Zweigenbaum, Nona Naderi
This paper describes our submission to Task 2 of SemEval-2024: Safe Biomedical Natural Language Inference for Clinical Trials.
2 code implementations • 27 Mar 2024 • Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum
User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world.
1 code implementation • 3 Aug 2022 • Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian Möller, Pierre Zweigenbaum
In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content.
no code implementations • LREC 2022 • Hui-Syuan Yeh, Thomas Lavergne, Pierre Zweigenbaum
In this paper, we investigate prompting for biomedical relation extraction, with experiments on the ChemProt dataset.
no code implementations • 25 Nov 2021 • Anfu Tang, Claire Nédellec, Pierre Zweigenbaum, Louise Deléger, Robert Bossy
We investigate a method to extract relations from texts based on global alignment and syntactic information.
1 code implementation • 25 Nov 2021 • Anfu Tang, Louise Deléger, Robert Bossy, Pierre Zweigenbaum, Claire Nédellec
Recently many studies have been conducted on the topic of relation extraction.
1 code implementation • BMC Bioinformatics 2020 • Arnaud Ferré, Louise Deléger, Robert Bossy, Pierre Zweigenbaum, Claire Nédellec
Entity normalization is an important information extraction task which has gained renewed attention in the last decade, particularly in the biomedical and life science domains.
Ranked #1 on Medical Concept Normalization on BB-norm-phenotype
2 code implementations • COLING 2020 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum, Junichi Tsujii
Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers.
Ranked #1 on Semantic Similarity on ClinicalSTS
Clinical Concept Extraction Drug–drug Interaction Extraction +3
no code implementations • LREC 2020 • Arnaud Ferr{\'e}, Robert Bossy, Mouhamadou Ba, Louise Del{\'e}ger, Thomas Lavergne, Pierre Zweigenbaum, Claire N{\'e}dellec
We propose a new approach to address the scarcity of training data that extends the CONTES method by corpus selection, pre-processing and weak supervision strategies, which can yield high-performance results without any manually annotated examples.
no code implementations • LREC 2020 • Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
The shared task of the 13th Workshop on Building and Using Comparable Corpora was devoted to the induction of bilingual dictionaries from comparable rather than parallel corpora.
no code implementations • 18 Sep 2019 • Zheng Zhang, Ruiqing Yin, Jun Zhu, Pierre Zweigenbaum
Recent work in cross-lingual contextual word embedding learning cannot handle multi-sense words well.
1 code implementation • ACL 2019 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum
Using pre-trained word embeddings in conjunction with Deep Learning models has become the {``}de facto{''} approach in Natural Language Processing (NLP).
Ranked #4 on Clinical Concept Extraction on 2010 i2b2/VA
no code implementations • ACL 2018 • Zheng Zhang, Pierre Zweigenbaum
Negative sampling is an important component in word2vec for distributed word representation learning.
1 code implementation • WS 2018 • Zheng Zhang, Pierre Zweigenbaum, Ruiqing Yin
Corpus2graph is an open-source NLP-application-oriented tool that generates a word co-occurrence network from a large corpus.
no code implementations • WS 2017 • Leonardo Campillos Llanos, Sophie Rosset, Pierre Zweigenbaum
We present the work-in-progress of automating the classification of doctor-patient questions in the context of a simulated consultation with a virtual patient.
no code implementations • WS 2017 • Arnaud Ferr{\'e}, Pierre Zweigenbaum, Claire N{\'e}dellec
The method generates continuous vector representations of complex terms in a semantic space structured by the ontology.
no code implementations • WS 2017 • Zheng Zhang, Pierre Zweigenbaum
This paper describes the zNLP system for the BUCC 2017 shared task.
no code implementations • WS 2017 • Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp
We examined manually a small sample of the false negative sentence pairs for the most precise French-English runs and estimated the number of parallel sentence pairs not yet in the provided gold standard.
no code implementations • JEPTALNRECITAL 2017 • Christopher Norman, Cyril Grouin, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Pierre Zweigenbaum
Nous proposons des d{\'e}monstrations de trois outils d{\'e}velopp{\'e}s par le LIMSI en traitement automatique des langues appliqu{\'e} au domaine biom{\'e}dical : la d{\'e}tection de concepts m{\'e}dicaux dans des textes courts, la cat{\'e}gorisation d{'}articles scientifiques pour l{'}assistance {\`a} l{'}{\'e}criture de revues syst{\'e}matiques, et l{'}anonymisation de textes cliniques.
no code implementations • JEPTALNRECITAL 2017 • Christopher Norman, Mariska Leeflang, Pierre Zweigenbaum, Aur{\'e}lie N{\'e}v{\'e}ol
Nous appliquons un mod{\`e}le de regression logistique sur deux corpus issus de revues syst{\'e}matiques conduites dans le domaine du traitement automatique de la langue et de l{'}efficacit{\'e} des m{\'e}dicaments.
no code implementations • JEPTALNRECITAL 2017 • Pierre Zweigenbaum, Thomas Lavergne
Nous faisons l{'}hypoth{\`e}se qu{'}une annotation {\`a} un niveau de granularit{\'e} plus fin, typiquement au niveau de l{'}{\'e}nonc{\'e}, devrait am{\'e}liorer la performance d{'}un d{\'e}tecteur automatique entra{\^\i}n{\'e} sur ces donn{\'e}es.
no code implementations • WS 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne
In some plain text documents, end-of-line marks may or may not mark the boundary of a text unit (e. g., of a paragraph).
no code implementations • WS 2016 • Eva D{'}hondt, Cyril Grouin, Aur{\'e}lie N{\'e}v{\'e}ol, Efstathios Stamatatos, Pierre Zweigenbaum
Electronic Health Records (EHRs) are increasingly available in modern health care institutions either through the direct creation of electronic documents in hospitals{'} health information systems, or through the digitization of historical paper records.
no code implementations • WS 2016 • Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Aude Robert, Cyril Grouin, Gr{\'e}goire Rey, Pierre Zweigenbaum
Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English.
no code implementations • WS 2016 • Estelle Chaix, Bertr Dubreucq, , Abdelhak Fatihi, Dialekti Valsamou, Robert Bossy, Mouhamadou Ba, Louise Del{\'e}ger, Pierre Zweigenbaum, Philippe Bessi{\`e}res, Loic Lepiniec, Claire N{\'e}dellec
no code implementations • JEPTALNRECITAL 2016 • Wafa Neifar, Thierry Hamon, Pierre Zweigenbaum, Mariem Ellouze, lamia hadrich belguith
L{'}adaptation a d{'}abord consist{\'e} {\`a} d{\'e}crire le processus d{'}extraction des termes de mani{\`e}re similaire {\`a} celui d{\'e}fini pour l{'}anglais et le fran{\c{c}}ais en prenant en compte certains particularit{\'e}s morpho-syntaxiques de la langue arabe.
no code implementations • JEPTALNRECITAL 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne
Nous proposons une m{\'e}thode enti{\`e}rement non-supervis{\'e}e pour d{\'e}terminer si une fin de ligne doit {\^e}tre vue comme un simple espace ou comme une v{\'e}ritable fronti{\`e}re d{'}unit{\'e} textuelle, et la testons sur un corpus de comptes rendus m{\'e}dicaux.
no code implementations • LREC 2016 • Leonardo Campillos Llanos, Dhouha Bouamor, Pierre Zweigenbaum, Sophie Rosset
We introduce a dialogue task between a virtual patient and a doctor where the dialogue system, playing the patient part in a simulated consultation, must reconcile a specialized level, to understand what the doctor says, and a lay level, to output realistic patient-language utterances.
no code implementations • LREC 2016 • Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, Pierre Zweigenbaum
When trained on the output of the first classifier, the second classifier{'}s performances are the following: p=0. 683;r=0. 956;f1=0. 797.
no code implementations • LREC 2016 • Dhouha Bouamor, Leonardo Campillos Llanos, Anne-Laure Ligozat, Sophie Rosset, Pierre Zweigenbaum
While measuring the readability of texts has been a long-standing research topic, assessing the technicality of terms has only been addressed more recently and mostly for the English language.
no code implementations • JEPTALNRECITAL 2015 • Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, Pierre Zweigenbaum
Nous estimons que l{'}analyse de ces variantes pourrait permettre de mod{\'e}liser les erreurs faites par les usagers des forums lorsqu{'}ils {\'e}crivent les noms de m{\'e}dicaments, et am{\'e}liorer en cons{\'e}quence les syst{\`e}mes de recherche d{'}information.
no code implementations • JEPTALNRECITAL 2015 • Leonardo Campillos, Dhouha Bouamor, {\'E}ric Bilinski, Anne-Laure Ligozat, Pierre Zweigenbaum, Sophie Rosset
Le d{\'e}monstrateur que nous d{\'e}crivons ici est un prototype de syst{\`e}me de dialogue dont l{'}objectif est de simuler un patient.
no code implementations • JEPTALNRECITAL 2015 • Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, V{\'e}ronique Moriceau, Pierre Zweigenbaum
Dans cet article, nous nous int{\'e}ressons {\`a} la mani{\`e}re dont sont exprim{\'e}s les liens qui existent entre un traitement m{\'e}dical et un effet secondaire.
no code implementations • JEPTALNRECITAL 2015 • Cyril Grouin, V{\'e}ronique Moriceau, Sophie Rosset, Pierre Zweigenbaum
Dans cet article, nous pr{\'e}sentons les m{\'e}thodes que nous avons d{\'e}velopp{\'e}es pour analyser des comptes- rendus hospitaliers r{\'e}dig{\'e}s en anglais.
no code implementations • JEPTALNRECITAL 2014 • Thierry Hamon, Quentin Plepl{\'e}, Patrick Paroubek, Pierre Zweigenbaum, Cyril Grouin
no code implementations • LREC 2014 • Maria Evangelia Chatzimina, Cyril Grouin, Pierre Zweigenbaum
We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser.
no code implementations • LREC 2014 • Louise Del{\'e}ger, Anne-Laure Ligozat, Cyril Grouin, Pierre Zweigenbaum, Aur{\'e}lie N{\'e}v{\'e}ol
We present the annotation scheme as well as the results of a pilot annotation study covering 35 clinical documents in a variety of subfields and genres.
no code implementations • LREC 2014 • Aur{\'e}lie N{\'e}v{\'e}ol, Julien Grosjean, St{\'e}fan Darmoni, Pierre Zweigenbaum
The biomedical domain offers a wealth of linguistic resources for Natural Language Processing, including terminologies and corpora.
no code implementations • LREC 2014 • Cyril Grouin, Jeremy Leixa, Aurélie Névéol, Sophie Rosset, Xavier Tannier, Pierre Zweigenbaum
Overall, a total of 26, 409 entity annotations were mapped to 5, 797 unique UMLS concepts.
no code implementations • JEPTALNRECITAL 2012 • Asma Ben Abacha, Pierre Zweigenbaum, Aur{\'e}lien Max
no code implementations • JEPTALNRECITAL 2012 • Patrick Paroubek, Pierre Zweigenbaum, Dominic Forest, Cyril Grouin
no code implementations • LREC 2012 • Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, Ludovic Quintard
Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities.
Named Entity Recognition (NER) Optical Character Recognition (OCR)
no code implementations • LREC 2012 • Dhouha Bouamor, Nasredine Semmar, Pierre Zweigenbaum
MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT).