Search Results for author: Sophie Rosset

Found 55 papers, 9 papers with code

Attention Modulation for Zero-Shot Cross-Domain Dialogue State Tracking

1 code implementation • COLING (CODI, CRAC) 2022 • Mathilde Veron, Olivier Galibert, Guillaume Bernard, Sophie Rosset

Dialog state tracking (DST) is a core step for task-oriented dialogue systems aiming to track the user’s current goal during a dialogue.

dialog state tracking Dialogue State Tracking +1

Paper
Code

Analyzing BERT Cross-lingual Transfer Capabilities in Continual Sequence Labeling

1 code implementation • MMMPIE (COLING) 2022 • Juan Manuel Coria, Mathilde Veron, Sahar Ghannay, Guillaume Bernard, Hervé Bredin, Olivier Galibert, Sophie Rosset

Knowledge transfer between neural language models is a widely used technique that has proven to improve performance in a multitude of natural language tasks, in particular with the recent rise of large pre-trained language models like BERT.

Continual Learning Cross-Lingual Transfer +6

Paper
Code

Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification

no code implementations • 17 Apr 2024 • Pierre Lepagnol, Thomas Gerald, Sahar Ghannay, Christophe Servan, Sophie Rosset

This study is part of the debate on the efficiency of large versus small language models for text classification by prompting. We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models. Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring functions.

text-classification Text Classification +2

Paper
Add Code

New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark

1 code implementation • 28 Mar 2024 • Nadège Alavoine, Gaëlle Laperriere, Christophe Servan, Sahar Ghannay, Sophie Rosset

A combination ofmultiple datasets, including the MEDIA dataset, was suggested for training this joint model.

intent-classification Intent Classification +4

Paper
Code

mALBERT: Is a Compact Multilingual BERT Model Still Worth It?

no code implementations • 27 Mar 2024 • Christophe Servan, Sahar Ghannay, Sophie Rosset

Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical andecological impact of such models.

Language Modelling Question Answering

Paper
Add Code

On the Usability of Transformers-based models for a French Question-Answering task

no code implementations • RANLP 2021 • Oralie Cattan, Christophe Servan, Sophie Rosset

In this paper, we establish a state-of-the-art of the efforts dedicated to the usability of Transformer-based models and propose to evaluate these improvements on the question-answering performances of French language which have few resources.

Cross-Lingual Transfer Data Augmentation +2

Paper
Add Code

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

no code implementations • 19 Jul 2022 • Oralie Cattan, Sahar Ghannay, Christophe Servan, Sophie Rosset

In this paper, we propose a unified benchmark, focused on evaluating models quality and their ecological impact on two well-known French spoken language understanding tasks.

Benchmarking Spoken Language Understanding

Paper
Add Code

On the cross-lingual transferability of multilingual prototypical models across NLU tasks

no code implementations • ACL (MetaNLP) 2021 • Oralie Cattan, Christophe Servan, Sophie Rosset

Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications when a sufficient number of training examples are available.

Few-Shot Learning Natural Language Understanding +1

Paper
Add Code

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

1 code implementation • 14 Sep 2021 • Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset

We propose to address online speaker diarization as a combination of incremental clustering and local diarization applied to a rolling buffer updated every 500ms.

Clustering Segmentation +2

791

Paper
Code

Evaluate On-the-job Learning Dialogue Systems and a Case Study for Natural Language Understanding

no code implementations • 26 Feb 2021 • Mathilde Veron, Sophie Rosset, Olivier Galibert, Guillaume Bernard

On-the-job learning consists in continuously learning while being used in production, in an open environment, meaning that the system has to deal on its own with situations and elements never seen before.

Natural Language Understanding

Paper
Add Code

LIMSI\_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis

no code implementations • SEMEVAL 2020 • Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, Paolo Rosso

This paper describes the participation of LIMSI{\_}UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text.

Sentiment Analysis

Paper
Add Code

Neural Networks approaches focused on French Spoken Language Understanding: application to the MEDIA Evaluation Task

1 code implementation • COLING 2020 • Sahar Ghannay, Christophe Servan, Sophie Rosset

In this paper, we present a study on a French Spoken Language Understanding (SLU) task: the MEDIA task.

Spoken Language Understanding Word Embeddings

Paper
Code

LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis

1 code implementation • 30 Aug 2020 • Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, Paolo Rosso

This paper describes the participation of LIMSI UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text.

Sentiment Analysis

Paper
Code

A Metric Learning Approach to Misogyny Categorization

no code implementations • WS 2020 • Juan Manuel Coria, Sahar Ghannay, Sophie Rosset, Herv{\'e} Bredin

The task of automatic misogyny identification and categorization has not received as much attention as other natural language tasks have, even though it is crucial for identifying hate speech in social Internet interactions.

Metric Learning Sentence +2

Paper
Add Code

O\`u en sommes-nous dans la reconnaissance des entit\'es nomm\'ees structur\'ees \`a partir de la parole ? (Where are we in Named Entity Recognition from speech ?)

no code implementations • JEPTALNRECITAL 2020 • Antoine Caubri{\`e}re, Sophie Rosset, Yannick Est{\`e}ve, Antoine Laurent, Emmanuel Morin

Les derni{\`e}res donn{\'e}es disponibles pour la REN structur{\'e}es {\`a} partir de la parole en fran{\c{c}}ais proviennent de la campagne d{'}{\'e}valuation ETAPE en 2012.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Where are we in Named Entity Recognition from Speech?

no code implementations • LREC 2020 • Antoine Caubri{\`e}re, Sophie Rosset, Yannick Est{\`e}ve, Antoine Laurent, Emmanuel Morin

For this type of systems, we propose an original 3-pass approach.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

1 code implementation • 31 Mar 2020 • Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset

Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification.

Metric Learning Speaker Verification

Paper
Code

DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation

2 code implementations • 30 May 2019 • Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski

We provide a preliminary analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.

Machine Translation Sentence +1

Paper
Code

Survey on Evaluation Methods for Dialogue Systems

no code implementations • 10 May 2019 • Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, Mark Cieliebak

We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.

Question Answering Task-Oriented Dialogue Systems

Paper
Add Code

Natural language understanding for task oriented dialog in the biomedical domain in a low resources context

no code implementations • 23 Nov 2018 • Antoine Neuraz, Leonardo Campillos Llanos, Anita Burgun, Sophie Rosset

In the biomedical domain, the lack of sharable datasets often limit the possibility of developing natural language processing systems, especially dialogue applications and natural language understanding models.

Data Augmentation General Classification +6

Paper
Add Code

Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard

no code implementations • LREC 2018 • Delphine Bernhard, Anne-Laure Ligozat, Fanny Martin, Myriam Bras, Pierre Magistry, Marianne Vergez-Couret, Lucie Steibl{\'e}, Pascale Erhart, Nabil Hathout, Dominique Huck, Christophe Rey, Philippe Reyn{\'e}s, Sophie Rosset, Jean Sibille, Thomas Lavergne

Paper
Add Code

Detecting context-dependent sentences in parallel corpora

no code implementations • JEPTALNRECITAL 2018 • Rachel Bawden, Thomas Lavergne, Sophie Rosset

In this article, we provide several approaches to the automatic identification of parallel sentences that require sentence-external linguistic context to be correctly translated.

Machine Translation Sentence +1

Paper
Add Code

\'Etiquetage en parties du discours de langues peu dot\'ees par sp\'ecialisation des plongements lexicaux (POS tagging for low-resource languages by adapting word embeddings )

no code implementations • JEPTALNRECITAL 2018 • Pierre Magistry, Anne-Laure Ligozat, Sophie Rosset

Cet article pr{\'e}sente une nouvelle m{\'e}thode d{'}{\'e}tiquetage en parties du discours adapt{\'e}e aux langues peu dot{\'e}es : la d{\'e}finition du contexte utilis{\'e} pour construire les plongements lexicaux est adapt{\'e}e {\`a} la t{\^a}che, et de nouveaux vecteurs sont cr{\'e}{\'e}s pour les mots inconnus.

POS POS Tagging +1

Paper
Add Code

Automatic classification of doctor-patient questions for a virtual patient record query task

no code implementations • WS 2017 • Leonardo Campillos Llanos, Sophie Rosset, Pierre Zweigenbaum

We present the work-in-progress of automating the classification of doctor-patient questions in the context of a simulated consultation with a virtual patient.

BIG-bench Machine Learning Dialogue Management +4

Paper
Add Code

Apprendre des repr\'esentations jointes de mots et d'entit\'es pour la d\'esambigu\"\isation d'entit\'es (Combining Word and Entity Embeddings for Entity Linking)

no code implementations • JEPTALNRECITAL 2017 • Jos{\'e} Moreno, Romaric Besan{\c{c}}on, Romain Beaumont, Eva D{'}hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, Brigitte Grau

La d{\'e}sambigu{\"\i}sation d{'}entit{\'e}s (ou liaison d{'}entit{\'e}s), qui consiste {\`a} relier des mentions d{'}entit{\'e}s d{'}un texte {\`a} des entit{\'e}s d{'}une base de connaissance, est un probl{\`e}me qui se pose, entre autre, pour le peuplement automatique de bases de connaissances {\`a} partir de textes.

Entity Embeddings Entity Linking

Paper
Add Code

Utterance Retrieval Based on Recurrent Surface Text Patterns

1 code implementation • 8 Apr 2017 • Guillaume Dubuisson Duplessis, Franck Charras, Vincent Letard, Anne-Laure Ligozat, Sophie Rosset

This paper investigates the use of recurrent surface text patterns to represent and index open-domain dialogue utterances for a retrieval system that can be embedded in a conversational agent.

Retrieval

Paper
Code

\'Evaluation de l'apprentissage incr\'emental par analogie (Incremental Learning From Scratch Using Analogical Reasoning )

no code implementations • JEPTALNRECITAL 2016 • Vincent Letard, Gabriel Illouz, Sophie Rosset

Cet article examine l{'}utilisation du raisonnement analogique dans le contexte de l{'}apprentissage incr{\'e}mental.

Incremental Learning

Paper
Add Code

Un syst\`eme automatique de s\'election de r\'eponse en domaine ouvert int\'egrable \`a un syst\`eme de dialogue social (An automatic open-domain response selection system integrable to a social dialogue system)

no code implementations • JEPTALNRECITAL 2016 • Franck Charras, Guillaume Dubuisson Duplessis, Vincent Letard, Anne-Laure Ligozat, Sophie Rosset

Cette d{\'e}monstration pr{\'e}sente un syst{\`e}me de dialogue en domaine ouvert qui utilise une base d{'}exemples de dialogue automatiquement constitu{\'e}e depuis un corpus de sous-titres afin de g{\'e}rer un dialogue social de type « chatbot ».

Chatbot

Paper
Add Code

Estimation de la qualit\'e d'un syst\`eme de reconnaissance de la parole pour une t\^ache de compr\'ehension (Quality estimation of a Speech Recognition System for a Spoken Language Understanding task)

no code implementations • JEPTALNRECITAL 2016 • Olivier Galibert, Nathalie Camelin, Paul Del{\'e}glise, Sophie Rosset

Nous comparons ici diff{\'e}rentes m{\'e}triques, notamment le WER, NE-WER et ATENE m{\'e}trique propos{\'e}e r{\'e}cemment pour l{'}{\'e}valuation des syst{\`e}mes de reconnaissance de la parole {\'e}tant donn{\'e} une t{\^a}che de reconnaissance d{'}entit{\'e}s nomm{\'e}es.

speech-recognition Speech Recognition +1

Paper
Add Code

Comparaison de listes d'erreurs de transcription automatique de la parole : quelle compl\'ementarit\'e entre les diff\'erentes m\'etriques ? (Comparing error lists for ASR systems : contribution of different metrics)

no code implementations • JEPTALNRECITAL 2016 • Olivier Galibert, Juliette Kahn, Sophie Rosset

Le travail que nous pr{\'e}sentons ici s{'}inscrit dans le domaine de l{'}{\'e}valuation des syst{\`e}mes de reconnaissance automatique de la parole en vue de leur utilisation dans une t{\^a}che aval, ici la reconnaissance des entit{\'e}s nomm{\'e}es.

Paper
Add Code

Generating Task-Pertinent sorted Error Lists for Speech Recognition

no code implementations • LREC 2016 • Olivier Galibert, Mohamed Ameur Ben Jannet, Juliette Kahn, Sophie Rosset

In the context of Automatic Speech Recognition (ASR) used as a first step towards Named Entity Recognition (NER) in speech, error seriousness is usually determined by their frequency, due to the use of the WER as metric to evaluate the ASR output, despite the emergence of more relevant measures in the literature.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents

no code implementations • LREC 2016 • Johann Poignant, Mateusz Budnik, Herv{\'e} Bredin, Claude Barras, Mickael Stefas, Pierrick Bruneau, Gilles Adda, Laurent Besacier, Hazim Ekenel, Gil Francopoulo, Hern, Javier o, Joseph Mariani, Ramon Morros, Georges Qu{\'e}not, Sophie Rosset, Thomas Tamisier

In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data.

Active Learning Management

Paper
Add Code

Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality

no code implementations • LREC 2016 • Dhouha Bouamor, Leonardo Campillos Llanos, Anne-Laure Ligozat, Sophie Rosset, Pierre Zweigenbaum

While measuring the readability of texts has been a long-standing research topic, assessing the technicality of terms has only been addressed more recently and mostly for the English language.

Language Modelling Learning-To-Rank

Paper
Add Code

Managing Linguistic and Terminological Variation in a Medical Dialogue System

no code implementations • LREC 2016 • Leonardo Campillos Llanos, Dhouha Bouamor, Pierre Zweigenbaum, Sophie Rosset

We introduce a dialogue task between a virtual patient and a doctor where the dialogue system, playing the patient part in a simulated consultation, must reconcile a specialized level, to understand what the doctor says, and a lay level, to output realistic patient-language utterances.

Sentence Spoken Language Understanding

Paper
Add Code

Purely Corpus-based Automatic Conversation Authoring

no code implementations • LREC 2016 • Guillaume Dubuisson Duplessis, Vincent Letard, Anne-Laure Ligozat, Sophie Rosset

This system is used as a chatterbot system to collect a corpus of 41 open-domain textual dialogues with 27 human participants.

Paper
Add Code

Named Entity Resources - Overview and Outlook

no code implementations • LREC 2016 • Maud Ehrmann, Damien Nouvel, Sophie Rosset

Recognition of real-world entities is crucial for most NLP applications.

Entity Linking

Paper
Add Code

Description of the PatientGenesys Dialogue System

no code implementations • WS 2015 • Leonardo Campillos Llanos, Dhouha Bouamor, {\'E}ric Bilinski, Anne-Laure Ligozat, Pierre Zweigenbaum, Sophie Rosset

Paper
Add Code

Identification de facteurs de risque pour des patients diab\'etiques \`a partir de comptes-rendus cliniques par des approches hybrides

no code implementations • JEPTALNRECITAL 2015 • Cyril Grouin, V{\'e}ronique Moriceau, Sophie Rosset, Pierre Zweigenbaum

Dans cet article, nous pr{\'e}sentons les m{\'e}thodes que nous avons d{\'e}velopp{\'e}es pour analyser des comptes- rendus hospitaliers r{\'e}dig{\'e}s en anglais.

Paper
Add Code

Un patient virtuel dialogant

no code implementations • JEPTALNRECITAL 2015 • Leonardo Campillos, Dhouha Bouamor, {\'E}ric Bilinski, Anne-Laure Ligozat, Pierre Zweigenbaum, Sophie Rosset

Le d{\'e}monstrateur que nous d{\'e}crivons ici est un prototype de syst{\`e}me de dialogue dont l{'}objectif est de simuler un patient.

Paper
Add Code

A Mapping-Based Approach for General Formal Human Computer Interaction Using Natural Language

no code implementations • ACL 2014 • Vincent Letard, Sophie Rosset, Gabriel Illouz

Paper
Add Code

ETER : a new metric for the evaluation of hierarchical named entity recognition

no code implementations • LREC 2014 • Mohamed Ben Jannet, Martine Adda-Decker, Olivier Galibert, Juliette Kahn, Sophie Rosset

We then introduce a new metric, the Entity Tree Error Rate (ETER), to evaluate hierarchical and structured named entity detection, classification and decomposition.

Entity Extraction using GAN General Classification +3

Paper
Add Code

Human annotation of ASR error regions: Is ``gravity'' a sharable concept for human annotators?

no code implementations • LREC 2014 • Daniel Luzzati, Cyril Grouin, Ioana Vasilescu, Martine Adda-Decker, Eric Bilinski, Nathalie Camelin, Juliette Kahn, Carole Lailler, Lori Lamel, Sophie Rosset

This paper is concerned with human assessments of the severity of errors in ASR outputs.

Information Retrieval Named Entity Recognition (NER) +1

Paper
Add Code

Morpho-Syntactic Study of Errors from Speech Recognition System

no code implementations • LREC 2014 • Maria Goryainova, Cyril Grouin, Sophie Rosset, Ioana Vasilescu

The study provides an original standpoint of the speech transcription errors by focusing on the morpho-syntactic features of the erroneous chunks and of the surrounding left and right context.

Named Entity Recognition (NER) POS +3

Paper
Add Code

The Quaero French Medical Corpus: A Ressource for Medical Entity Recognition and Normalization

no code implementations • LREC 2014 • Cyril Grouin, Jeremy Leixa, Aurélie Névéol, Sophie Rosset, Xavier Tannier, Pierre Zweigenbaum

Overall, a total of 26, 409 entity annotations were mapped to 5, 797 unique UMLS concepts.

Paper
Add Code

Automatic Named Entity Pre-annotation for Out-of-domain Human Annotation

no code implementations • WS 2013 • Sophie Rosset, Cyril Grouin, Thomas Lavergne, Mohamed Ben Jannet, J{\'e}r{\'e}my Leixa, Olivier Galibert, Pierre Zweigenbaum

Paper
Add Code

Web pages segmentation for document selection in Question Answering (Pr\'e-segmentation de pages web et s\'election de documents pertinents en Questions-R\'eponses) [in French]

no code implementations • JEPTALNRECITAL 2013 • Nicolas Foucault, Sophie Rosset, Gilles Adda

Question Answering

Paper
Add Code

Modeling the Complexity of Manual Annotation Tasks: a Grid of Analysis

no code implementations • COLING 2012 • Kar{\"e}n Fort, Adeline Nazarenko, Sophie Rosset

Paper
Add Code

Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics

no code implementations • COLING 2012 • Yann Mathet, Antoine Widl{\"o}cher, Kar{\"e}n Fort, Claire Fran{\c{c}}ois, Olivier Galibert, Cyril Grouin, Juliette Kahn, Sophie Rosset, Pierre Zweigenbaum

Paper
Add Code

Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers

no code implementations • WS 2012 • Sophie Rosset, Cyril Grouin, Kar{\"e}n Fort, Olivier Galibert, Juliette Kahn, Pierre Zweigenbaum

Named Entity Recognition (NER)

Paper
Add Code

Quel est l'apport de la d\'etection d'entit\'es nomm\'ees pour l'extraction d'information en domaine restreint ? (What is the contribution of named entities detection for information extraction in restricted domain ?) [in French]

no code implementations • JEPTALNRECITAL 2012 • Camille Dutrey, Chlo{\'e} Clavel, Sophie Rosset, Ioana Vasilescu, Martine Adda-Decker

Paper
Add Code

Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign

no code implementations • LREC 2012 • Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, Ludovic Quintard

Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities.

Named Entity Recognition (NER) Optical Character Recognition (OCR)

Paper
Add Code

Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results

no code implementations • LREC 2012 • Marco Dinarelli, Sophie Rosset

We evaluate our procedure for preprocessing OCR-ized data in two ways: in terms of perplexity and OOV rate of a language model on development and evaluation data, and in terms of the performance of the named entity detection system on the preprocessed data.

Language Modelling named-entity-recognition +3