Search Results for author: Sophie Rosset

Found 55 papers, 9 papers with code

Analyzing BERT Cross-lingual Transfer Capabilities in Continual Sequence Labeling

1 code implementation MMMPIE (COLING) 2022 Juan Manuel Coria, Mathilde Veron, Sahar Ghannay, Guillaume Bernard, Hervé Bredin, Olivier Galibert, Sophie Rosset

Knowledge transfer between neural language models is a widely used technique that has proven to improve performance in a multitude of natural language tasks, in particular with the recent rise of large pre-trained language models like BERT.

Continual Learning Cross-Lingual Transfer +6

Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification

no code implementations17 Apr 2024 Pierre Lepagnol, Thomas Gerald, Sahar Ghannay, Christophe Servan, Sophie Rosset

This study is part of the debate on the efficiency of large versus small language models for text classification by prompting. We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models. Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring functions.

text-classification Text Classification +2

mALBERT: Is a Compact Multilingual BERT Model Still Worth It?

no code implementations27 Mar 2024 Christophe Servan, Sahar Ghannay, Sophie Rosset

Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical andecological impact of such models.

Language Modelling Question Answering

On the Usability of Transformers-based models for a French Question-Answering task

no code implementations RANLP 2021 Oralie Cattan, Christophe Servan, Sophie Rosset

In this paper, we establish a state-of-the-art of the efforts dedicated to the usability of Transformer-based models and propose to evaluate these improvements on the question-answering performances of French language which have few resources.

Cross-Lingual Transfer Data Augmentation +2

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

no code implementations19 Jul 2022 Oralie Cattan, Sahar Ghannay, Christophe Servan, Sophie Rosset

In this paper, we propose a unified benchmark, focused on evaluating models quality and their ecological impact on two well-known French spoken language understanding tasks.

Benchmarking Spoken Language Understanding

On the cross-lingual transferability of multilingual prototypical models across NLU tasks

no code implementations ACL (MetaNLP) 2021 Oralie Cattan, Christophe Servan, Sophie Rosset

Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications when a sufficient number of training examples are available.

Few-Shot Learning Natural Language Understanding +1

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

1 code implementation14 Sep 2021 Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset

We propose to address online speaker diarization as a combination of incremental clustering and local diarization applied to a rolling buffer updated every 500ms.

Clustering Segmentation +2

Evaluate On-the-job Learning Dialogue Systems and a Case Study for Natural Language Understanding

no code implementations26 Feb 2021 Mathilde Veron, Sophie Rosset, Olivier Galibert, Guillaume Bernard

On-the-job learning consists in continuously learning while being used in production, in an open environment, meaning that the system has to deal on its own with situations and elements never seen before.

Natural Language Understanding

LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis

1 code implementation30 Aug 2020 Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, Paolo Rosso

This paper describes the participation of LIMSI UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text.

Sentiment Analysis

A Metric Learning Approach to Misogyny Categorization

no code implementations WS 2020 Juan Manuel Coria, Sahar Ghannay, Sophie Rosset, Herv{\'e} Bredin

The task of automatic misogyny identification and categorization has not received as much attention as other natural language tasks have, even though it is crucial for identifying hate speech in social Internet interactions.

Metric Learning Sentence +2

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

1 code implementation31 Mar 2020 Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset

Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification.

Metric Learning Speaker Verification

DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation

2 code implementations30 May 2019 Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski

We provide a preliminary analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.

Machine Translation Sentence +1

Survey on Evaluation Methods for Dialogue Systems

no code implementations10 May 2019 Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, Mark Cieliebak

We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.

Question Answering Task-Oriented Dialogue Systems

Natural language understanding for task oriented dialog in the biomedical domain in a low resources context

no code implementations23 Nov 2018 Antoine Neuraz, Leonardo Campillos Llanos, Anita Burgun, Sophie Rosset

In the biomedical domain, the lack of sharable datasets often limit the possibility of developing natural language processing systems, especially dialogue applications and natural language understanding models.

Data Augmentation General Classification +6

\'Etiquetage en parties du discours de langues peu dot\'ees par sp\'ecialisation des plongements lexicaux (POS tagging for low-resource languages by adapting word embeddings )

no code implementations JEPTALNRECITAL 2018 Pierre Magistry, Anne-Laure Ligozat, Sophie Rosset

Cet article pr{\'e}sente une nouvelle m{\'e}thode d{'}{\'e}tiquetage en parties du discours adapt{\'e}e aux langues peu dot{\'e}es : la d{\'e}finition du contexte utilis{\'e} pour construire les plongements lexicaux est adapt{\'e}e {\`a} la t{\^a}che, et de nouveaux vecteurs sont cr{\'e}{\'e}s pour les mots inconnus.

POS POS Tagging +1

Detecting context-dependent sentences in parallel corpora

no code implementations JEPTALNRECITAL 2018 Rachel Bawden, Thomas Lavergne, Sophie Rosset

In this article, we provide several approaches to the automatic identification of parallel sentences that require sentence-external linguistic context to be correctly translated.

Machine Translation Sentence +1

Automatic classification of doctor-patient questions for a virtual patient record query task

no code implementations WS 2017 Leonardo Campillos Llanos, Sophie Rosset, Pierre Zweigenbaum

We present the work-in-progress of automating the classification of doctor-patient questions in the context of a simulated consultation with a virtual patient.

BIG-bench Machine Learning Dialogue Management +4

Apprendre des repr\'esentations jointes de mots et d'entit\'es pour la d\'esambigu\"\isation d'entit\'es (Combining Word and Entity Embeddings for Entity Linking)

no code implementations JEPTALNRECITAL 2017 Jos{\'e} Moreno, Romaric Besan{\c{c}}on, Romain Beaumont, Eva D{'}hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, Brigitte Grau

La d{\'e}sambigu{\"\i}sation d{'}entit{\'e}s (ou liaison d{'}entit{\'e}s), qui consiste {\`a} relier des mentions d{'}entit{\'e}s d{'}un texte {\`a} des entit{\'e}s d{'}une base de connaissance, est un probl{\`e}me qui se pose, entre autre, pour le peuplement automatique de bases de connaissances {\`a} partir de textes.

Entity Embeddings Entity Linking

Utterance Retrieval Based on Recurrent Surface Text Patterns

1 code implementation8 Apr 2017 Guillaume Dubuisson Duplessis, Franck Charras, Vincent Letard, Anne-Laure Ligozat, Sophie Rosset

This paper investigates the use of recurrent surface text patterns to represent and index open-domain dialogue utterances for a retrieval system that can be embedded in a conversational agent.


Un syst\`eme automatique de s\'election de r\'eponse en domaine ouvert int\'egrable \`a un syst\`eme de dialogue social (An automatic open-domain response selection system integrable to a social dialogue system)

no code implementations JEPTALNRECITAL 2016 Franck Charras, Guillaume Dubuisson Duplessis, Vincent Letard, Anne-Laure Ligozat, Sophie Rosset

Cette d{\'e}monstration pr{\'e}sente un syst{\`e}me de dialogue en domaine ouvert qui utilise une base d{'}exemples de dialogue automatiquement constitu{\'e}e depuis un corpus de sous-titres afin de g{\'e}rer un dialogue social de type « chatbot ».


Comparaison de listes d'erreurs de transcription automatique de la parole : quelle compl\'ementarit\'e entre les diff\'erentes m\'etriques ? (Comparing error lists for ASR systems : contribution of different metrics)

no code implementations JEPTALNRECITAL 2016 Olivier Galibert, Juliette Kahn, Sophie Rosset

Le travail que nous pr{\'e}sentons ici s{'}inscrit dans le domaine de l{'}{\'e}valuation des syst{\`e}mes de reconnaissance automatique de la parole en vue de leur utilisation dans une t{\^a}che aval, ici la reconnaissance des entit{\'e}s nomm{\'e}es.

Estimation de la qualit\'e d'un syst\`eme de reconnaissance de la parole pour une t\^ache de compr\'ehension (Quality estimation of a Speech Recognition System for a Spoken Language Understanding task)

no code implementations JEPTALNRECITAL 2016 Olivier Galibert, Nathalie Camelin, Paul Del{\'e}glise, Sophie Rosset

Nous comparons ici diff{\'e}rentes m{\'e}triques, notamment le WER, NE-WER et ATENE m{\'e}trique propos{\'e}e r{\'e}cemment pour l{'}{\'e}valuation des syst{\`e}mes de reconnaissance de la parole {\'e}tant donn{\'e} une t{\^a}che de reconnaissance d{'}entit{\'e}s nomm{\'e}es.

speech-recognition Speech Recognition +1

Generating Task-Pertinent sorted Error Lists for Speech Recognition

no code implementations LREC 2016 Olivier Galibert, Mohamed Ameur Ben Jannet, Juliette Kahn, Sophie Rosset

In the context of Automatic Speech Recognition (ASR) used as a first step towards Named Entity Recognition (NER) in speech, error seriousness is usually determined by their frequency, due to the use of the WER as metric to evaluate the ASR output, despite the emergence of more relevant measures in the literature.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Managing Linguistic and Terminological Variation in a Medical Dialogue System

no code implementations LREC 2016 Leonardo Campillos Llanos, Dhouha Bouamor, Pierre Zweigenbaum, Sophie Rosset

We introduce a dialogue task between a virtual patient and a doctor where the dialogue system, playing the patient part in a simulated consultation, must reconcile a specialized level, to understand what the doctor says, and a lay level, to output realistic patient-language utterances.

Sentence Spoken Language Understanding

Purely Corpus-based Automatic Conversation Authoring

no code implementations LREC 2016 Guillaume Dubuisson Duplessis, Vincent Letard, Anne-Laure Ligozat, Sophie Rosset

This system is used as a chatterbot system to collect a corpus of 41 open-domain textual dialogues with 27 human participants.

Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality

no code implementations LREC 2016 Dhouha Bouamor, Leonardo Campillos Llanos, Anne-Laure Ligozat, Sophie Rosset, Pierre Zweigenbaum

While measuring the readability of texts has been a long-standing research topic, assessing the technicality of terms has only been addressed more recently and mostly for the English language.

Language Modelling Learning-To-Rank

Un patient virtuel dialogant

no code implementations JEPTALNRECITAL 2015 Leonardo Campillos, Dhouha Bouamor, {\'E}ric Bilinski, Anne-Laure Ligozat, Pierre Zweigenbaum, Sophie Rosset

Le d{\'e}monstrateur que nous d{\'e}crivons ici est un prototype de syst{\`e}me de dialogue dont l{'}objectif est de simuler un patient.

Identification de facteurs de risque pour des patients diab\'etiques \`a partir de comptes-rendus cliniques par des approches hybrides

no code implementations JEPTALNRECITAL 2015 Cyril Grouin, V{\'e}ronique Moriceau, Sophie Rosset, Pierre Zweigenbaum

Dans cet article, nous pr{\'e}sentons les m{\'e}thodes que nous avons d{\'e}velopp{\'e}es pour analyser des comptes- rendus hospitaliers r{\'e}dig{\'e}s en anglais.

ETER : a new metric for the evaluation of hierarchical named entity recognition

no code implementations LREC 2014 Mohamed Ben Jannet, Martine Adda-Decker, Olivier Galibert, Juliette Kahn, Sophie Rosset

We then introduce a new metric, the Entity Tree Error Rate (ETER), to evaluate hierarchical and structured named entity detection, classification and decomposition.

Entity Extraction using GAN General Classification +3

Morpho-Syntactic Study of Errors from Speech Recognition System

no code implementations LREC 2014 Maria Goryainova, Cyril Grouin, Sophie Rosset, Ioana Vasilescu

The study provides an original standpoint of the speech transcription errors by focusing on the morpho-syntactic features of the erroneous chunks and of the surrounding left and right context.

Named Entity Recognition (NER) POS +3

Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results

no code implementations LREC 2012 Marco Dinarelli, Sophie Rosset

We evaluate our procedure for preprocessing OCR-ized data in two ways: in terms of perplexity and OOV rate of a language model on development and evaluation data, and in terms of the performance of the named entity detection system on the preprocessed data.

Language Modelling named-entity-recognition +3

Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign

no code implementations LREC 2012 Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, Ludovic Quintard

Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities.

Named Entity Recognition (NER) Optical Character Recognition (OCR)

Cannot find the paper you are looking for? You can Submit a new open access paper.