Search Results for author: Pierre Zweigenbaum

Found 69 papers, 8 papers with code

Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain

no code implementations LREC 2022 Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum

BERT models used in specialized domains all seem to be the result of a simple strategy: initializing with the original BERT and then resuming pre-training on a specialized corpus.

Building Comparable Corpora for Assessing Multi-Word Term Alignment

no code implementations LREC 2022 Omar Adjali, Emmanuel Morin, Pierre Zweigenbaum

To that aim, we exploit parallel corpora to perform automatic bilingual MWT extraction and comparable corpus construction.

Machine Translation

MAPA Project: Ready-to-Go Open-Source Datasets and Deep Learning Technology to Remove Identifying Information from Text Documents

no code implementations LEGAL (LREC) 2022 Victoria Arranz, Khalid Choukri, Montse Cuadros, Aitor García Pablos, Lucie Gianola, Cyril Grouin, Manuel Herranz, Patrick Paroubek, Pierre Zweigenbaum

This paper presents the outcomes of the MAPA project, a set of annotated corpora for 24 languages of the European Union and an open-source customisable toolkit able to detect and substitute sensitive information in text documents from any domain, using state-of-the art, deep learning-based named entity recognition techniques.

De-identification named-entity-recognition +2

Global alignment for relation extraction in Microbiology

no code implementations25 Nov 2021 Anfu Tang, Claire Nédellec, Pierre Zweigenbaum, Louise Deléger, Robert Bossy

We investigate a method to extract relations from texts based on global alignment and syntactic information.

Relation Relation Extraction

C-Norm: a neural approach to few-shot entity normalization

1 code implementation BMC Bioinformatics 2020 Arnaud Ferré, Louise Deléger, Robert Bossy, Pierre Zweigenbaum, Claire Nédellec

Entity normalization is an important information extraction task which has gained renewed attention in the last decade, particularly in the biomedical and life science domains.

Few-Shot Learning Medical Concept Normalization

CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

2 code implementations COLING 2020 Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum, Junichi Tsujii

Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers.

Clinical Concept Extraction Drug–drug Interaction Extraction +3

Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological Information

no code implementations LREC 2020 Arnaud Ferr{\'e}, Robert Bossy, Mouhamadou Ba, Louise Del{\'e}ger, Thomas Lavergne, Pierre Zweigenbaum, Claire N{\'e}dellec

We propose a new approach to address the scarcity of training data that extends the CONTES method by corpus selection, pre-processing and weak supervision strategies, which can yield high-performance results without any manually annotated examples.

BIG-bench Machine Learning Entity Linking

Overview of the Fourth BUCC Shared Task: Bilingual Dictionary Induction from Comparable Corpora

no code implementations LREC 2020 Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff

The shared task of the 13th Workshop on Building and Using Comparable Corpora was devoted to the induction of bilingual dictionaries from comparable rather than parallel corpora.

Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph

1 code implementation WS 2018 Zheng Zhang, Pierre Zweigenbaum, Ruiqing Yin

Corpus2graph is an open-source NLP-application-oriented tool that generates a word co-occurrence network from a large corpus.

Keyword Extraction

Automatic classification of doctor-patient questions for a virtual patient record query task

no code implementations WS 2017 Leonardo Campillos Llanos, Sophie Rosset, Pierre Zweigenbaum

We present the work-in-progress of automating the classification of doctor-patient questions in the context of a simulated consultation with a virtual patient.

BIG-bench Machine Learning Dialogue Management +4

Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora

no code implementations WS 2017 Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp

We examined manually a small sample of the false negative sentence pairs for the most precise French-English runs and estimated the number of parallel sentence pairs not yet in the provided gold standard.

Machine Translation Sentence

Traitement automatique de la langue biom\'edicale au LIMSI (Biomedical language processing at LIMSI)

no code implementations JEPTALNRECITAL 2017 Christopher Norman, Cyril Grouin, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Pierre Zweigenbaum

Nous proposons des d{\'e}monstrations de trois outils d{\'e}velopp{\'e}s par le LIMSI en traitement automatique des langues appliqu{\'e} au domaine biom{\'e}dical : la d{\'e}tection de concepts m{\'e}dicaux dans des textes courts, la cat{\'e}gorisation d{'}articles scientifiques pour l{'}assistance {\`a} l{'}{\'e}criture de revues syst{\'e}matiques, et l{'}anonymisation de textes cliniques.

Tri Automatique de la Litt\'erature pour les Revues Syst\'ematiques (Automatically Ranking the Literature in Support of Systematic Reviews)

no code implementations JEPTALNRECITAL 2017 Christopher Norman, Mariska Leeflang, Pierre Zweigenbaum, Aur{\'e}lie N{\'e}v{\'e}ol

Nous appliquons un mod{\`e}le de regression logistique sur deux corpus issus de revues syst{\'e}matiques conduites dans le domaine du traitement automatique de la langue et de l{'}efficacit{\'e} des m{\'e}dicaments.

Classification General Classification

D\'etection de concepts et granularit\'e de l'annotation (Concept detection and annotation granularity )

no code implementations JEPTALNRECITAL 2017 Pierre Zweigenbaum, Thomas Lavergne

Nous faisons l{'}hypoth{\`e}se qu{'}une annotation {\`a} un niveau de granularit{\'e} plus fin, typiquement au niveau de l{'}{\'e}nonc{\'e}, devrait am{\'e}liorer la performance d{'}un d{\'e}tecteur automatique entra{\^\i}n{\'e} sur ces donn{\'e}es.

Detection of Text Reuse in French Medical Corpora

no code implementations WS 2016 Eva D{'}hondt, Cyril Grouin, Aur{\'e}lie N{\'e}v{\'e}ol, Efstathios Stamatatos, Pierre Zweigenbaum

Electronic Health Records (EHRs) are increasingly available in modern health care institutions either through the direct creation of electronic documents in hospitals{'} health information systems, or through the digitization of historical paper records.

De-identification Optical Character Recognition (OCR)

A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage

no code implementations WS 2016 Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Aude Robert, Cyril Grouin, Gr{\'e}goire Rey, Pierre Zweigenbaum

Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English.

Named Entity Recognition (NER)

Impact de l'agglutination dans l'extraction de termes en arabe standard moderne (Adaptation of a term extractor to the Modern Standard Arabic language)

no code implementations JEPTALNRECITAL 2016 Wafa Neifar, Thierry Hamon, Pierre Zweigenbaum, Mariem Ellouze, lamia hadrich belguith

L{'}adaptation a d{'}abord consist{\'e} {\`a} d{\'e}crire le processus d{'}extraction des termes de mani{\`e}re similaire {\`a} celui d{\'e}fini pour l{'}anglais et le fran{\c{c}}ais en prenant en compte certains particularit{\'e}s morpho-syntaxiques de la langue arabe.

Une cat\'egorisation de fins de lignes non-supervis\'ee (End-of-line classification with no supervision)

no code implementations JEPTALNRECITAL 2016 Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne

Nous proposons une m{\'e}thode enti{\`e}rement non-supervis{\'e}e pour d{\'e}terminer si une fin de ligne doit {\^e}tre vue comme un simple espace ou comme une v{\'e}ritable fronti{\`e}re d{'}unit{\'e} textuelle, et la testons sur un corpus de comptes rendus m{\'e}dicaux.

Managing Linguistic and Terminological Variation in a Medical Dialogue System

no code implementations LREC 2016 Leonardo Campillos Llanos, Dhouha Bouamor, Pierre Zweigenbaum, Sophie Rosset

We introduce a dialogue task between a virtual patient and a doctor where the dialogue system, playing the patient part in a simulated consultation, must reconcile a specialized level, to understand what the doctor says, and a lay level, to output realistic patient-language utterances.

Sentence Spoken Language Understanding

Identification of Drug-Related Medical Conditions in Social Media

no code implementations LREC 2016 Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, Pierre Zweigenbaum

When trained on the output of the first classifier, the second classifier{'}s performances are the following: p=0. 683;r=0. 956;f1=0. 797.

Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality

no code implementations LREC 2016 Dhouha Bouamor, Leonardo Campillos Llanos, Anne-Laure Ligozat, Sophie Rosset, Pierre Zweigenbaum

While measuring the readability of texts has been a long-standing research topic, assessing the technicality of terms has only been addressed more recently and mostly for the English language.

Language Modelling Learning-To-Rank

\'Etude des verbes introducteurs de noms de m\'edicaments dans les forums de sant\'e

no code implementations JEPTALNRECITAL 2015 Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, Pierre Zweigenbaum

Nous estimons que l{'}analyse de ces variantes pourrait permettre de mod{\'e}liser les erreurs faites par les usagers des forums lorsqu{'}ils {\'e}crivent les noms de m{\'e}dicaments, et am{\'e}liorer en cons{\'e}quence les syst{\`e}mes de recherche d{'}information.

Un patient virtuel dialogant

no code implementations JEPTALNRECITAL 2015 Leonardo Campillos, Dhouha Bouamor, {\'E}ric Bilinski, Anne-Laure Ligozat, Pierre Zweigenbaum, Sophie Rosset

Le d{\'e}monstrateur que nous d{\'e}crivons ici est un prototype de syst{\`e}me de dialogue dont l{'}objectif est de simuler un patient.

Identification de facteurs de risque pour des patients diab\'etiques \`a partir de comptes-rendus cliniques par des approches hybrides

no code implementations JEPTALNRECITAL 2015 Cyril Grouin, V{\'e}ronique Moriceau, Sophie Rosset, Pierre Zweigenbaum

Dans cet article, nous pr{\'e}sentons les m{\'e}thodes que nous avons d{\'e}velopp{\'e}es pour analyser des comptes- rendus hospitaliers r{\'e}dig{\'e}s en anglais.

Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports

no code implementations LREC 2014 Maria Evangelia Chatzimina, Cyril Grouin, Pierre Zweigenbaum

We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser.

Chunking Clustering +2

Annotation of specialized corpora using a comprehensive entity and relation scheme

no code implementations LREC 2014 Louise Del{\'e}ger, Anne-Laure Ligozat, Cyril Grouin, Pierre Zweigenbaum, Aur{\'e}lie N{\'e}v{\'e}ol

We present the annotation scheme as well as the results of a pilot annotation study covering 35 clinical documents in a variety of subfields and genres.

Relation

Language Resources for French in the Biomedical Domain

no code implementations LREC 2014 Aur{\'e}lie N{\'e}v{\'e}ol, Julien Grosjean, St{\'e}fan Darmoni, Pierre Zweigenbaum

The biomedical domain offers a wealth of linguistic resources for Natural Language Processing, including terminologies and corpora.

Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign

no code implementations LREC 2012 Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, Ludovic Quintard

Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities.

Named Entity Recognition (NER) Optical Character Recognition (OCR)

Identifying bilingual Multi-Word Expressions for Statistical Machine Translation

no code implementations LREC 2012 Dhouha Bouamor, Nasredine Semmar, Pierre Zweigenbaum

MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT).

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.