Search Results for author: Richard Dufour

Found 39 papers, 10 papers with code

Mesures linguistiques automatiques pour l’évaluation des systèmes de Reconnaissance Automatique de la Parole (Automated linguistic measures for automatic speech recognition systems’ evaluation)

no code implementations • JEP/TALN/RECITAL 2022 • Thibault Bañeras Roux, Mickaël Rouvier, Jane Wottawa, Richard Dufour

L’évaluation de transcriptions issues de systèmes de Reconnaissance Automatique de la Parole (RAP) est un problème difficile et toujours ouvert, qui se résume généralement à ne considérer que le WER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Correction automatique d’examens écrits par approche neuronale profonde et attention croisée bidirectionnelle (Deep Neural Networks and Bidirectional Cross-Attention for Automatic Answer Grading)

no code implementations • JEP/TALN/RECITAL 2022 • Yanis Labrak, Philippe Turcotte, Richard Dufour, Mickael Rouvier

Nous proposons trois systèmes de classification reposant sur des caractéristiques extraites de plongements de mots contextuels issus d’un modèle BERT (CamemBERT).

Paper
Add Code

Remplacement de mentions pour l’adaptation d’un corpus de reconnaissance d’entités nommées à un domaine cible (Mention replacement for adapting a named entity recognition dataset to a target domain)

no code implementations • JEP/TALN/RECITAL 2022 • Arthur Amalvy, Vincent Labatut, Richard Dufour

La reconnaissance d’entités nommées est une tâche de traitement automatique du langage naturel bien étudiée et utile dans de nombreuses applications.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions

no code implementations • 1 Mar 2024 • Leane Jourdan, Florian Boudin, Nicolas Hernandez, Richard Dufour

Writing a scientific article is a challenging task as it is a highly codified and specific genre, consequently proficiency in written communication is essential for effectively conveying research findings and ideas.

Sentence

Paper
Add Code

Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems

no code implementations • 29 Feb 2024 • Quentin Raymondaud, Mickael Rouvier, Richard Dufour

Following many researches in neural networks interpretability, we propose in this article a protocol that aims to determine which and where information is located in an ASR acoustic model (AM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

How Important Is Tokenization in French Medical Masked Language Models?

no code implementations • 22 Feb 2024 • Yanis Labrak, Adrien Bazoge, Beatrice Daille, Mickael Rouvier, Richard Dufour

Subword tokenization has become the prevailing standard in the field of natural language processing (NLP) over recent years, primarily due to the widespread utilization of pre-trained language models.

Paper
Add Code

DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

1 code implementation • 20 Feb 2024 • Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, Pacome Constant dit Beaufils, Natalia Grabar, Beatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud, Richard Dufour

This limitation hampers the evaluation of the latest French biomedical models, as they are either assessed on a minimal number of tasks with non-standardized protocols or evaluated using general downstream tasks.

named-entity-recognition Named Entity Recognition +3

Paper
Code

Language Model Adaptation to Specialized Domains through Selective Masking based on Genre and Topical Characteristics

1 code implementation • 19 Feb 2024 • Anas Belfathi, Ygor Gallina, Nicolas Hernandez, Richard Dufour, Laura Monceaux

Recent advances in pre-trained language modeling have facilitated significant progress across various natural language processing (NLP) tasks.

Language Modelling

Paper
Code

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

no code implementations • 15 Feb 2024 • Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, Richard Dufour

This marks the first large-scale multilingual evaluation of LLMs in the medical domain.

Quantization Question Answering

Paper
Add Code

Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset

1 code implementation • 16 Oct 2023 • Arthur Amalvy, Vincent Labatut, Richard Dufour

Using this dataset, we train a neural context retriever based on a BERT model that is able to find relevant context for NER.

Language Modelling Large Language Model +5

Paper
Code

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

no code implementations • 22 Jul 2023 • Yanis Labrak, Mickael Rouvier, Richard Dufour

We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

The Role of Global and Local Context in Named Entity Recognition

1 code implementation • 4 May 2023 • Arthur Amalvy, Vincent Labatut, Richard Dufour

Pre-trained transformer-based models have recently shown great performance when applied to Named Entity Recognition (NER).

named-entity-recognition Named Entity Recognition +1

Paper
Code

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

1 code implementation • LOUHI 2022 • Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain.

Ranked #1 on Multiple Choice Question Answering (MCQA) on FrenchMedMCQA

Multiple-choice Multiple Choice Question Answering (MCQA)

Paper
Code

DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains

no code implementations • 3 Apr 2023 • Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks.

Paper
Add Code

Text revision in Scientific Writing Assistance: An Overview

1 code implementation • 29 Mar 2023 • Léane Jourdan, Florian Boudin, Richard Dufour, Nicolas Hernandez

Writing a scientific article is a challenging task as it is a highly codified genre.

Paper
Code

Data Augmentation for Robust Character Detection in Fantasy Novels

1 code implementation • 9 Feb 2023 • Arthur Amalvy, Vincent Labatut, Richard Dufour

Named Entity Recognition (NER) is a low-level task often used as a foundation for solving higher level NLP problems.

Data Augmentation named-entity-recognition +2

Paper
Code

ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus

1 code implementation • International Conference on Text, Speech and Dialogue (TSD) 2022 • Yanis Labrak, Richard Dufour

Part-of-speech (POS) tagging is a classical natural language processing (NLP) task.

Ranked #1 on Part-Of-Speech Tagging on ANTILLES

Part-Of-Speech Tagging POS +1

Paper
Code

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

no code implementations • 20 Apr 2022 • Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj Doğan, Jingcheng Du, Li Fang, Kai Wang, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh, Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Senja Pollak, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu, Richard Dufour, Yanis Labrak, Niladri Chatterjee, Kushagri Tandon, Fréjus Laleye, Loïc Rakotoson, Emmanuele Chersoni, Jinghang Gu, Annemarie Friedrich, Subhash Chandra Pujari, Mariia Chizhikova, Naveen Sivadasan, Zhiyong Lu

To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature.

Benchmarking Multi-Label Classification

Paper
Add Code

Label Refining: a semi-supervised method to extract voice characteristics without ground truth

no code implementations • 29 Sep 2021 • Mathias Quillot, Richard Dufour, Jean-français Bonastre

To address this problem, we propose a new semi-supervised learning method entitled Label Refining that consists in extracting refined labels (e. g. vocal characteristics) from known initial labels (e. g. character played in a recording).

Paper
Add Code

La voix act\'ee : pratiques, enjeux, applications (Acted voice : practices, challenges, applications)

no code implementations • JEPTALNRECITAL 2020 • Mathias Quillot, Lauriane Guillou, Adrien Gresse, Rafa{\"e}l Ferro, Rapha{\"e}l R{\"o}th, Damien Malinas, Richard Dufour, Axel Roebel, Nicolas Obin, Jean-Fran{\c{c}}ois Bonastre, Emmanuel Ethis

La voix act{\'e}e repr{\'e}sente un d{\'e}fi majeur pour les futures interfaces vocales avec un potentiel d{'}application extr{\^e}mement important pour la transformation num{\'e}rique des secteurs de la culture et de la communication, comme la production ou la post-production de voix pour les s{\'e}ries ou le cin{\'e}ma.

Cultural Vocal Bursts Intensity Prediction

Paper
Add Code

Apprentissage automatique de repr\'esentation de voix \`a l'aide d'une distillation de la connaissance pour le casting vocal (Learning voice representation using knowledge distillation for automatic voice casting )

no code implementations • JEPTALNRECITAL 2020 • Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-Fran{\c{c}}ois Bonastre

Les exp{\'e}riences men{\'e}es sur des extraits de voix de jeux vid{\'e}o montrent une am{\'e}lioration significative de l{'}approche p-vecteur, avec distillation de la connaissance, par rapport {\`a} une repr{\'e}sentation x-vecteur, {\'e}tat-de-l{'}art en reconnaissance du locuteur.

Knowledge Distillation

Paper
Add Code

A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study

no code implementations • LREC 2020 • Salima Mdhaffar, Yannick Est{\`e}ve, Antoine Laurent, Hern, Nicolas ez, Richard Dufour, Delphine Charlet, Geraldine Damnati, Solen Quiniou, Nathalie Camelin

The use cases concern scientific fields from both speech and text processing, with language model adaptation, thematic segmentation and transcription to slide alignment.

Language Modelling

Paper
Add Code

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection

1 code implementation • LREC 2020 • Noé Cecillon, Vincent Labatut, Richard Dufour, Georges Linares

This large corpus of more than 380k annotated messages opens perspectives for online abuse detection and especially for context-based approaches.

Abuse Detection Benchmarking

Paper
Code

Abusive Language Detection in Online Conversations by Combining Content-and Graph-based Features

1 code implementation • 20 May 2019 • Noé Cecillon, Vincent Labatut, Richard Dufour, Georges Linarès

In recent years, online social networks have allowed worldwide users to meet and discuss.

Abuse Detection

Paper
Code

Conversational Networks for Automatic Online Moderation

no code implementations • 31 Jan 2019 • Etienne Papegnies, Vincent Labatut, Richard Dufour, Georges Linares

We identify the most appropriate network extraction parameters and discuss the discriminative power of our features, relatively to their topological and temporal nature.

Abuse Detection

Paper
Add Code

Graph-based Features for Automatic Online Abuse Detection

no code implementations • 3 Aug 2017 • Etienne Papegnies, Vincent Labatut, Richard Dufour, Georges Linares

While online communities have become increasingly important over the years, the moderation of user-generated content is still performed mostly manually.

Abuse Detection

Paper
Add Code

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

no code implementations • 20 Mar 2017 • Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier Ramírez-Rodríguez, Georges Linarès

One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate.

Information Retrieval Retrieval +1

Paper
Add Code

Systèmes du LIA à DEFT'13

no code implementations • 21 Feb 2017 • Xavier Bost, Ilaria Brunetti, Luis Adrián Cabrera-Diego, Jean-Valère Cossu, Andréa Linhares, Mohamed Morchid, Juan-Manuel Torres-Moreno, Marc El-Bèze, Richard Dufour

The 2013 D\'efi de Fouille de Textes (DEFT) campaign is interested in two types of language analysis tasks, the document classification and the information extraction in the specialized domain of cuisine recipes.

Document Classification General Classification

Paper
Add Code

Parallel Long Short-Term Memory for Multi-stream Classification

no code implementations • 11 Feb 2017 • Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato de Mori

Nevertheless, these RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM) directions.

Classification General Classification

Paper
Add Code

Un Sous-espace Th\'ematique Latent pour la Compr\'ehension du Langage Parl\'e (A Latent Topic-based Subspace for Spoken Language Understanding)

no code implementations • JEPTALNRECITAL 2016 • Mohamed Bouaziz, Mohamed Morchid, Pierre-Michel Bousquet, Richard Dufour, Killian Janod, Waad Ben Kheder, Georges Linar{\`e}s

Les applications de compr{\'e}hension du langage parl{\'e} sont moins performantes si les documents transcrits automatiquement contiennent un taux d{'}erreur-mot {\'e}lev{\'e}.

Spoken Language Understanding

Paper
Add Code

Auto-encodeurs pour la compr\'ehension de documents parl\'es (Auto-encoders for Spoken Document Understanding)

no code implementations • JEPTALNRECITAL 2016 • Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s, Renato de Mori

Les repr{\'e}sentations de documents au moyen d{'}approches {\`a} base de r{\'e}seaux de neurones ont montr{\'e} des am{\'e}liorations significatives dans de nombreuses t{\^a}ches du traitement du langage naturel.

document understanding

Paper
Add Code

Un Corpus de Flux TV Annot\'es pour la Pr\'ediction de Genres (A Genre Annotated Corpus of French Multi-channel TV Streams for Genre Prediction)

no code implementations • JEPTALNRECITAL 2016 • Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s, Prosper Correa

Cet article pr{\'e}sente une m{\'e}thode de pr{\'e}diction de genres d{'}{\'e}missions t{\'e}l{\'e}vis{\'e}es couvrant 2 jours de diffusion de 4 cha{\^\i}nes TV fran{\c{c}}aises structur{\'e}s en {\'e}missions annot{\'e}es en genres.

Paper
Add Code

Apport de l'information temporelle des contextes pour la repr\'esentation vectorielle continue des mots

no code implementations • JEPTALNRECITAL 2015 • Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linares

Ces approches sont manipul{\'e}es au travers d{'}un r{\'e}seau de neurones, l{'}architecture CBOW cherchant alors {\`a} pr{\'e}dire un mot sachant son contexte, alors que l{'}architecture Skip-Gram pr{\'e}dit un contexte sachant un mot.

Paper
Add Code

Initialisation de R\'eseaux de Neurones \`a l'aide d'un Espace Th\'ematique

no code implementations • JEPTALNRECITAL 2015 • Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s

La m{\'e}thode propos{\'e}e consiste {\`a} configurer la topologie d{'}un ANN ainsi que d{'}initialiser les connexions de celui-ci {\`a} l{'}aide des espaces th{\'e}matiques appris pr{\'e}c{\'e}demment.

Paper
Add Code

An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

no code implementations • EMNLP 2014 • Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linar{\`e}s, Driss Matrouf, Renato de Mori

Speaker Recognition Speech Recognition

Paper
Add Code

Characterizing and Predicting Bursty Events: The Buzz Case Study on Twitter

no code implementations • LREC 2014 • Mohamed Morchid, Georges Linar{\`e}s, Richard Dufour

The prediction of bursty events on the Internet is a challenging task.

Paper
Add Code

A LDA-Based Topic Classification Approach From Highly Imperfect Automatic Transcriptions

no code implementations • LREC 2014 • Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s

Although the current transcription systems could achieve high recognition performance, they still have a lot of difficulties to transcribe speech in very noisy environments.

General Classification Information Retrieval +2

Paper
Add Code

Combinaison d'approches pour la reconnaissance du r\^ole des locuteurs (Combination of approaches for speaker role recognition) [in French]

no code implementations • JEPTALNRECITAL 2012 • Richard Dufour, Antoine Laurent, Yannick Est{\`e}ve

Paper
Add Code

D\'etection et caract\'erisation des r\'egions d'erreurs dans des transcriptions de contenus multim\'edia : application \`a la recherche des noms de personnes (Error region detection and characterization in transcriptions of multimedia documents : application to person name search) [in French]

no code implementations • JEPTALNRECITAL 2012 • Richard Dufour, G{\'e}raldine Damnati, Delphine Charlet

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.