Search Results for author: Richard Dufour

Found 39 papers, 10 papers with code

CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions

no code implementations1 Mar 2024 Leane Jourdan, Florian Boudin, Nicolas Hernandez, Richard Dufour

Writing a scientific article is a challenging task as it is a highly codified and specific genre, consequently proficiency in written communication is essential for effectively conveying research findings and ideas.

Sentence

Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems

no code implementations29 Feb 2024 Quentin Raymondaud, Mickael Rouvier, Richard Dufour

Following many researches in neural networks interpretability, we propose in this article a protocol that aims to determine which and where information is located in an ASR acoustic model (AM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

How Important Is Tokenization in French Medical Masked Language Models?

no code implementations22 Feb 2024 Yanis Labrak, Adrien Bazoge, Beatrice Daille, Mickael Rouvier, Richard Dufour

Subword tokenization has become the prevailing standard in the field of natural language processing (NLP) over recent years, primarily due to the widespread utilization of pre-trained language models.

DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

1 code implementation20 Feb 2024 Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, Pacome Constant dit Beaufils, Natalia Grabar, Beatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud, Richard Dufour

This limitation hampers the evaluation of the latest French biomedical models, as they are either assessed on a minimal number of tasks with non-standardized protocols or evaluated using general downstream tasks.

named-entity-recognition Named Entity Recognition +3

Language Model Adaptation to Specialized Domains through Selective Masking based on Genre and Topical Characteristics

1 code implementation19 Feb 2024 Anas Belfathi, Ygor Gallina, Nicolas Hernandez, Richard Dufour, Laura Monceaux

Recent advances in pre-trained language modeling have facilitated significant progress across various natural language processing (NLP) tasks.

Language Modelling

Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset

1 code implementation16 Oct 2023 Arthur Amalvy, Vincent Labatut, Richard Dufour

Using this dataset, we train a neural context retriever based on a BERT model that is able to find relevant context for NER.

Language Modelling Large Language Model +5

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

no code implementations22 Jul 2023 Yanis Labrak, Mickael Rouvier, Richard Dufour

We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc.

named-entity-recognition Named Entity Recognition +3

The Role of Global and Local Context in Named Entity Recognition

1 code implementation4 May 2023 Arthur Amalvy, Vincent Labatut, Richard Dufour

Pre-trained transformer-based models have recently shown great performance when applied to Named Entity Recognition (NER).

named-entity-recognition Named Entity Recognition +1

DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains

no code implementations3 Apr 2023 Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks.

Text revision in Scientific Writing Assistance: An Overview

1 code implementation29 Mar 2023 Léane Jourdan, Florian Boudin, Richard Dufour, Nicolas Hernandez

Writing a scientific article is a challenging task as it is a highly codified genre.

Data Augmentation for Robust Character Detection in Fantasy Novels

1 code implementation9 Feb 2023 Arthur Amalvy, Vincent Labatut, Richard Dufour

Named Entity Recognition (NER) is a low-level task often used as a foundation for solving higher level NLP problems.

Data Augmentation named-entity-recognition +2

Label Refining: a semi-supervised method to extract voice characteristics without ground truth

no code implementations29 Sep 2021 Mathias Quillot, Richard Dufour, Jean-français Bonastre

To address this problem, we propose a new semi-supervised learning method entitled Label Refining that consists in extracting refined labels (e. g. vocal characteristics) from known initial labels (e. g. character played in a recording).

La voix act\'ee : pratiques, enjeux, applications (Acted voice : practices, challenges, applications)

no code implementations JEPTALNRECITAL 2020 Mathias Quillot, Lauriane Guillou, Adrien Gresse, Rafa{\"e}l Ferro, Rapha{\"e}l R{\"o}th, Damien Malinas, Richard Dufour, Axel Roebel, Nicolas Obin, Jean-Fran{\c{c}}ois Bonastre, Emmanuel Ethis

La voix act{\'e}e repr{\'e}sente un d{\'e}fi majeur pour les futures interfaces vocales avec un potentiel d{'}application extr{\^e}mement important pour la transformation num{\'e}rique des secteurs de la culture et de la communication, comme la production ou la post-production de voix pour les s{\'e}ries ou le cin{\'e}ma.

Cultural Vocal Bursts Intensity Prediction

Apprentissage automatique de repr\'esentation de voix \`a l'aide d'une distillation de la connaissance pour le casting vocal (Learning voice representation using knowledge distillation for automatic voice casting )

no code implementations JEPTALNRECITAL 2020 Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-Fran{\c{c}}ois Bonastre

Les exp{\'e}riences men{\'e}es sur des extraits de voix de jeux vid{\'e}o montrent une am{\'e}lioration significative de l{'}approche p-vecteur, avec distillation de la connaissance, par rapport {\`a} une repr{\'e}sentation x-vecteur, {\'e}tat-de-l{'}art en reconnaissance du locuteur.

Knowledge Distillation

A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study

no code implementations LREC 2020 Salima Mdhaffar, Yannick Est{\`e}ve, Antoine Laurent, Hern, Nicolas ez, Richard Dufour, Delphine Charlet, Geraldine Damnati, Solen Quiniou, Nathalie Camelin

The use cases concern scientific fields from both speech and text processing, with language model adaptation, thematic segmentation and transcription to slide alignment.

Language Modelling

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection

1 code implementation LREC 2020 Noé Cecillon, Vincent Labatut, Richard Dufour, Georges Linares

This large corpus of more than 380k annotated messages opens perspectives for online abuse detection and especially for context-based approaches.

Abuse Detection Benchmarking

Conversational Networks for Automatic Online Moderation

no code implementations31 Jan 2019 Etienne Papegnies, Vincent Labatut, Richard Dufour, Georges Linares

We identify the most appropriate network extraction parameters and discuss the discriminative power of our features, relatively to their topological and temporal nature.

Abuse Detection

Graph-based Features for Automatic Online Abuse Detection

no code implementations3 Aug 2017 Etienne Papegnies, Vincent Labatut, Richard Dufour, Georges Linares

While online communities have become increasingly important over the years, the moderation of user-generated content is still performed mostly manually.

Abuse Detection

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

no code implementations20 Mar 2017 Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier Ramírez-Rodríguez, Georges Linarès

One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate.

Information Retrieval Retrieval +1

Systèmes du LIA à DEFT'13

no code implementations21 Feb 2017 Xavier Bost, Ilaria Brunetti, Luis Adrián Cabrera-Diego, Jean-Valère Cossu, Andréa Linhares, Mohamed Morchid, Juan-Manuel Torres-Moreno, Marc El-Bèze, Richard Dufour

The 2013 D\'efi de Fouille de Textes (DEFT) campaign is interested in two types of language analysis tasks, the document classification and the information extraction in the specialized domain of cuisine recipes.

Document Classification General Classification

Auto-encodeurs pour la compr\'ehension de documents parl\'es (Auto-encoders for Spoken Document Understanding)

no code implementations JEPTALNRECITAL 2016 Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s, Renato de Mori

Les repr{\'e}sentations de documents au moyen d{'}approches {\`a} base de r{\'e}seaux de neurones ont montr{\'e} des am{\'e}liorations significatives dans de nombreuses t{\^a}ches du traitement du langage naturel.

document understanding

Un Corpus de Flux TV Annot\'es pour la Pr\'ediction de Genres (A Genre Annotated Corpus of French Multi-channel TV Streams for Genre Prediction)

no code implementations JEPTALNRECITAL 2016 Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s, Prosper Correa

Cet article pr{\'e}sente une m{\'e}thode de pr{\'e}diction de genres d{'}{\'e}missions t{\'e}l{\'e}vis{\'e}es couvrant 2 jours de diffusion de 4 cha{\^\i}nes TV fran{\c{c}}aises structur{\'e}s en {\'e}missions annot{\'e}es en genres.

Apport de l'information temporelle des contextes pour la repr\'esentation vectorielle continue des mots

no code implementations JEPTALNRECITAL 2015 Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linares

Ces approches sont manipul{\'e}es au travers d{'}un r{\'e}seau de neurones, l{'}architecture CBOW cherchant alors {\`a} pr{\'e}dire un mot sachant son contexte, alors que l{'}architecture Skip-Gram pr{\'e}dit un contexte sachant un mot.

Initialisation de R\'eseaux de Neurones \`a l'aide d'un Espace Th\'ematique

no code implementations JEPTALNRECITAL 2015 Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s

La m{\'e}thode propos{\'e}e consiste {\`a} configurer la topologie d{'}un ANN ainsi que d{'}initialiser les connexions de celui-ci {\`a} l{'}aide des espaces th{\'e}matiques appris pr{\'e}c{\'e}demment.

A LDA-Based Topic Classification Approach From Highly Imperfect Automatic Transcriptions

no code implementations LREC 2014 Mohamed Morchid, Richard Dufour, Georges Linar{\`e}s

Although the current transcription systems could achieve high recognition performance, they still have a lot of difficulties to transcribe speech in very noisy environments.

General Classification Information Retrieval +2

Cannot find the paper you are looking for? You can Submit a new open access paper.