1 code implementation • LREC 2022 • Alexandra Benamar, Cyril Grouin, Meryl Bothua, Anne Vilnat
Our experiments have led to exciting findings that showed: (1) It is easier to improve the representation of new words (A and B) than it is for words that already exist in the vocabulary of the Transformer models (C), (2) To ameliorate the representation of OOVs, the most effective method relies on adding external morpho-syntactic context rather than improving the semantic understanding of the words directly (fine-tuning) and (3) We cannot foresee the impact of minor misspellings in words because similar misspellings have different impacts on their representation.
no code implementations • JEP/TALN/RECITAL 2021 • Cyril Grouin, Natalia Grabar, Gabriel Illouz
Le défi fouille de textes (DEFT) est une campagne d’évaluation annuelle francophone.
no code implementations • LEGAL (LREC) 2022 • Victoria Arranz, Khalid Choukri, Montse Cuadros, Aitor García Pablos, Lucie Gianola, Cyril Grouin, Manuel Herranz, Patrick Paroubek, Pierre Zweigenbaum
This paper presents the outcomes of the MAPA project, a set of annotated corpora for 24 languages of the European Union and an open-source customisable toolkit able to detect and substitute sensitive information in text documents from any domain, using state-of-the art, deep learning-based named entity recognition techniques.
no code implementations • JEP/TALN/RECITAL 2022 • Cyril Grouin, Gabriel Illouz
La correction de copies d’étudiants est une tâche coûteuse en temps pour l’enseignant.
no code implementations • JEP/TALN/RECITAL 2022 • Cyril Grouin
Le français inclusif est une variété du français standard mise en avant pour témoigner d’une conscience de genre et d’identité.
no code implementations • JEP/TALN/RECITAL 2022 • Alexandra Benamar, Cyril Grouin, Meryl Bothua, Anne Vilnat
Dans cet article, nous étudions les stéréotypes de genre qui existent dans des modèles Word2Vec.
2 code implementations • 27 Mar 2024 • Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum
User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world.
no code implementations • JEPTALNRECITAL 2020 • R{\'e}mi Cardon, Natalia Grabar, Cyril Grouin, Thierry Hamon
La troisi{\`e}me t{\^a}che propose d{'}extraire dix cat{\'e}gories d{'}informations du domaine m{\'e}dical depuis le corpus de cas cliniques de DEFT 2019.
no code implementations • LREC 2020 • Liyun Yan, Danni E, Mei Gan, Cyril Grouin, Mathieu Valette
Polarity classification (positive, negative or neutral opinion detection) is well developed in the field of opinion mining.
no code implementations • RANLP 2019 • Margot Mieskes, Kar{\"e}n Fort, Aur{\'e}lie N{\'e}v{\'e}ol, Cyril Grouin, Kevin Cohen
With recent efforts in drawing attention to the task of replicating and/or reproducing results, for example in the context of COLING 2018 and various LREC workshops, the question arises how the NLP community views the topic of replicability in general.
no code implementations • WS 2019 • Cyril Grouin, Natalia Grabar, Vincent Claveau, Thierry Hamon
Thus, we manually annotated a set of 717 files into four general categories (age, gender, outcome, and origin) for a total number of 2, 835 annotations.
no code implementations • JEPTALNRECITAL 2019 • Natalia Grabar, Cyril Grouin, Thierry Hamon, Vincent Claveau
Cet article pr{\'e}sente la campagne d{'}{\'e}valuation DEFT 2019 sur l{'}analyse de textes cliniques r{\'e}dig{\'e}s en fran{\c{c}}ais.
no code implementations • JEPTALNRECITAL 2019 • Natalia Grabar, Cyril Grouin, Thierry Hamon, Vincent Claveau
Pour r{\'e}pondre {\`a} ce d{\'e}fi, nous pr{\'e}sentons dans cet article le corpus CAS contenant des cas cliniques de patients, r{\'e}els ou fictifs, que nous avons compil{\'e}s. Ces cas cliniques en fran{\c{c}}ais couvrent plusieurs sp{\'e}cialit{\'e}s m{\'e}dicales et focalisent donc sur diff{\'e}rentes situations cliniques.
no code implementations • JEPTALNRECITAL 2018 • Patrick Paroubek, Cyril Grouin, Patrice Bellot, Vincent Claveau, Iris Eshkol-Taravella, Amel Fraisse, Agata Jackiewicz, Jihen Karoui, Laura Monceaux, Juan-Manuel Torres-Moreno
Cet article pr{\'e}sente l{'}{\'e}dition 2018 de la campagne d{'}{\'e}valuation DEFT (D{\'e}fi Fouille de Textes).
no code implementations • JEPTALNRECITAL 2018 • Cyril Grouin
Nous {\'e}tudions {\'e}galement la possibilit{\'e} de retrouver le niveau de d{\'e}tail des types d{'}EN du sch{\'e}ma d{'}origine {\`a} partir des versions simplifi{\'e}es.
no code implementations • IJCNLP 2017 • Eva D{'}hondt, Cyril Grouin, Brigitte Grau
In this paper we present a novel approach to the automatic correction of OCR-induced orthographic errors in a given text.
no code implementations • JEPTALNRECITAL 2017 • Christopher Norman, Cyril Grouin, Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Pierre Zweigenbaum
Nous proposons des d{\'e}monstrations de trois outils d{\'e}velopp{\'e}s par le LIMSI en traitement automatique des langues appliqu{\'e} au domaine biom{\'e}dical : la d{\'e}tection de concepts m{\'e}dicaux dans des textes courts, la cat{\'e}gorisation d{'}articles scientifiques pour l{'}assistance {\`a} l{'}{\'e}criture de revues syst{\'e}matiques, et l{'}anonymisation de textes cliniques.
no code implementations • WS 2016 • Thomas Lavergne, Aur{\'e}lie N{\'e}v{\'e}ol, Aude Robert, Cyril Grouin, Gr{\'e}goire Rey, Pierre Zweigenbaum
Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English.
no code implementations • WS 2016 • Eva D{'}hondt, Cyril Grouin, Aur{\'e}lie N{\'e}v{\'e}ol, Efstathios Stamatatos, Pierre Zweigenbaum
Electronic Health Records (EHRs) are increasingly available in modern health care institutions either through the direct creation of electronic documents in hospitals{'} health information systems, or through the digitization of historical paper records.
no code implementations • WS 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne
In some plain text documents, end-of-line marks may or may not mark the boundary of a text unit (e. g., of a paragraph).
no code implementations • JEPTALNRECITAL 2016 • Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne
Nous proposons une m{\'e}thode enti{\`e}rement non-supervis{\'e}e pour d{\'e}terminer si une fin de ligne doit {\^e}tre vue comme un simple espace ou comme une v{\'e}ritable fronti{\`e}re d{'}unit{\'e} textuelle, et la testons sur un corpus de comptes rendus m{\'e}dicaux.
no code implementations • LREC 2016 • Cyril Grouin
We achieved our best results with a model trained on homogeneous corpora (only files composed of 2 columns) when classifying each token into left or right columns (overall F-measure of 0. 968).
no code implementations • LREC 2016 • Cyril Grouin
In this paper, we presented the annotation propagation tool we designed to be used in conjunction with the BRAT rapid annotation tool.
no code implementations • LREC 2016 • Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, Pierre Zweigenbaum
When trained on the output of the first classifier, the second classifier{'}s performances are the following: p=0. 683;r=0. 956;f1=0. 797.
no code implementations • JEPTALNRECITAL 2015 • Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, V{\'e}ronique Moriceau, Pierre Zweigenbaum
Dans cet article, nous nous int{\'e}ressons {\`a} la mani{\`e}re dont sont exprim{\'e}s les liens qui existent entre un traitement m{\'e}dical et un effet secondaire.
no code implementations • JEPTALNRECITAL 2015 • Fran{\c{c}}ois Morlane-Hond{\`e}re, Cyril Grouin, Pierre Zweigenbaum
Nous estimons que l{'}analyse de ces variantes pourrait permettre de mod{\'e}liser les erreurs faites par les usagers des forums lorsqu{'}ils {\'e}crivent les noms de m{\'e}dicaments, et am{\'e}liorer en cons{\'e}quence les syst{\`e}mes de recherche d{'}information.
no code implementations • JEPTALNRECITAL 2015 • Cyril Grouin, V{\'e}ronique Moriceau, Sophie Rosset, Pierre Zweigenbaum
Dans cet article, nous pr{\'e}sentons les m{\'e}thodes que nous avons d{\'e}velopp{\'e}es pour analyser des comptes- rendus hospitaliers r{\'e}dig{\'e}s en anglais.
no code implementations • JEPTALNRECITAL 2014 • Thierry Hamon, Quentin Plepl{\'e}, Patrick Paroubek, Pierre Zweigenbaum, Cyril Grouin
no code implementations • LREC 2014 • Maria Goryainova, Cyril Grouin, Sophie Rosset, Ioana Vasilescu
The study provides an original standpoint of the speech transcription errors by focusing on the morpho-syntactic features of the erroneous chunks and of the surrounding left and right context.
no code implementations • LREC 2014 • Maria Evangelia Chatzimina, Cyril Grouin, Pierre Zweigenbaum
We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser.
no code implementations • LREC 2014 • Cyril Grouin
In this paper, we present the experiments we made to process entities from the biomedical domain.
no code implementations • LREC 2014 • Louise Del{\'e}ger, Anne-Laure Ligozat, Cyril Grouin, Pierre Zweigenbaum, Aur{\'e}lie N{\'e}v{\'e}ol
We present the annotation scheme as well as the results of a pilot annotation study covering 35 clinical documents in a variety of subfields and genres.
no code implementations • LREC 2014 • Daniel Luzzati, Cyril Grouin, Ioana Vasilescu, Martine Adda-Decker, Eric Bilinski, Nathalie Camelin, Juliette Kahn, Carole Lailler, Lori Lamel, Sophie Rosset
This paper is concerned with human assessments of the severity of errors in ASR outputs.
no code implementations • LREC 2014 • Cyril Grouin, Jeremy Leixa, Aurélie Névéol, Sophie Rosset, Xavier Tannier, Pierre Zweigenbaum
Overall, a total of 26, 409 entity annotations were mapped to 5, 797 unique UMLS concepts.
no code implementations • JEPTALNRECITAL 2012 • Patrick Paroubek, Pierre Zweigenbaum, Dominic Forest, Cyril Grouin
no code implementations • LREC 2012 • Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, Ludovic Quintard
Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities.
Named Entity Recognition (NER) Optical Character Recognition (OCR)