Search Results for author: Guillaume Wisniewski

Found 61 papers, 3 papers with code

The SPECTRANS System Description for the WMT21 Terminology Task

no code implementations • WMT (EMNLP) 2021 • Nicolas Ballier, Dahn Cho, Bilal Faye, Zong-You Ke, Hanna Martikainen, Mojca Pecman, Guillaume Wisniewski, Jean-Baptiste Yunès, Lichao Zhu, Maria Zimina-Poirot

Experiment 2 uses OpenNMT to fine-tune the model.

Paper
Add Code

How Distributed are Distributed Representations? An Observation on the Locality of Syntactic Information in Verb Agreement Tasks

no code implementations • ACL 2022 • Bingzhi Li, Guillaume Wisniewski, Benoit Crabbé

This work addresses the question of the localization of syntactic information encoded in the transformers representations.

feature selection Sentence

Paper
Add Code

Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content

no code implementations • WS (NoDaLiDa) 2019 • José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work compares the performances achieved by Phrase-Based Statistical Machine Translation systems (PB-SMT) and attention-based Neuronal Machine Translation systems (NMT) when translating User Generated Content (UGC), as encountered in social medias, from French to English.

Machine Translation NMT +1

Paper
Add Code

Flux d’informations dans les systèmes encodeur-décodeur. Application à l’explication des biais de genre dans les systèmes de traduction automatique. (Information flow in encoder-decoder systems applied to the explanation of gender bias in machine translation systems)

no code implementations • JEP/TALN/RECITAL 2022 • Lichao Zhu, Guillaume Wisniewski, Nicolas Ballier, François Yvon

Ce travail présente deux séries d’expériences visant à identifier les flux d’information dans les systèmes de traduction neuronaux.

Machine Translation

Paper
Add Code

Biais de genre dans un système de traduction automatiqueneuronale : une étude préliminaire (Gender Bias in Neural Translation : a preliminary study )

no code implementations • JEP/TALN/RECITAL 2021 • Guillaume Wisniewski, Lichao Zhou, Nicolas Ballier, François Yvon

Cet article présente les premiers résultats d’une étude en cours sur les biais de genre dans les corpus d’entraînements et dans les systèmes de traduction neuronale.

Paper
Add Code

Les représentations distribuées sont-elles vraiment distribuées ? Observations sur la localisation de l’information syntaxique dans les tâches d’accord du verbe en français (How Distributed are Distributed Representations ? An Observation on the Locality of Syntactic)

no code implementations • JEP/TALN/RECITAL 2022 • Bingzhi Li, Guillaume Wisniewski, Benoît Crabbé

Ce travail aborde la question de la localisation de l’information syntaxique qui est encodée dans les représentations de transformers.

Paper
Add Code

Fine-tuning pre-trained models for Automatic Speech Recognition, experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)

no code implementations • ComputEL (ACL) 2022 • Séverine Guillaume, Guillaume Wisniewski, Cécile Macaire, Guillaume Jacques, Alexis Michaud, Benjamin Galliot, Maximin Coavoux, Solange Rossato, Minh-Châu Nguyên, Maxime Fily

This is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models

no code implementations • 8 Feb 2024 • Maxime Fily, Guillaume Wisniewski, Severine Guillaume, Gilles Adda, Alexis Michaud

We propose a new unsupervised method using ABX tests on audio recordings with carefully curated metadata to shed light on the type of information present in the representations.

Paper
Add Code

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

no code implementations • 24 Oct 2023 • Lina Conti, Guillaume Wisniewski

Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision.

Paper
Add Code

From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape

no code implementations • 29 May 2023 • Séverine Guillaume, Guillaume Wisniewski, Alexis Michaud

We use max-pooling to aggregate the neural representations from a "snippet-lect" (the speech in a 5-second audio snippet) to a "doculect" (the speech in a given resource), then to dialects and languages.

Paper
Add Code

ProsAudit, a prosodic benchmark for self-supervised speech models

no code implementations • 23 Feb 2023 • Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, Emmanuel Dupoux

In the lexical task, the model needs to correctly distinguish between pauses inserted between words and within words.

Self-Supervised Learning

Paper
Add Code

Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement

1 code implementation • 8 Dec 2022 • Bingzhi Li, Guillaume Wisniewski, Benoît Crabbé

The long-distance agreement, evidence for syntactic structure, is increasingly used to assess the syntactic generalization of Neural Language Models.

counterfactual Object

Paper
Code

Is the Language Familiarity Effect gradual? A computational modelling approach

no code implementations • 27 Jun 2022 • Maureen de Seyssel, Guillaume Wisniewski, Emmanuel Dupoux

According to the Language Familiarity Effect (LFE), people are better at discriminating between speakers of their native language.

Paper
Add Code

Probing phoneme, language and speaker information in unsupervised speech representations

no code implementations • 30 Mar 2022 • Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski

Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages.

Language Modelling

Paper
Add Code

Screening Gender Transfer in Neural Machine Translation

no code implementations • EMNLP (BlackboxNLP) 2021 • Guillaume Wisniewski, Lichao Zhu, Nicolas Ballier, François Yvon

This paper aims at identifying the information flow in state-of-the-art machine translation systems, taking as example the transfer of gender when translating from French into English.

Machine Translation Translation

Paper
Add Code

Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History

no code implementations • 25 Feb 2022 • Aurélien Max, Guillaume Wisniewski

Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic processes on text.

Paper
Add Code

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

1 code implementation • WNUT (ACL) 2021 • José Carlos Rosales Núñez, Guillaume Wisniewski, Djamé Seddah

This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time.

Machine Translation Translation

Paper
Code

Understanding the Impact of UGC Specificities on Translation Quality

no code implementations • WNUT (ACL) 2021 • José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT.

Translation

Paper
Add Code

Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement

no code implementations • EMNLP 2021 • Bingzhi Li, Guillaume Wisniewski, Benoit Crabbé

Many recent works have demonstrated that unsupervised sentence representations of neural networks encode syntactic information by observing that neural language models are able to predict the agreement between a verb and its subject.

Sentence

Paper
Add Code

Are Neural Networks Extracting Linguistic Properties or Memorizing Training Data? An Observation with a Multilingual Probe for Predicting Tense

1 code implementation • EACL 2021 • Bingzhi Li, Guillaume Wisniewski

We evaluate the ability of Bert embeddings to represent tense information, taking French and Chinese as a case study.

Sentence

Paper
Code

User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

no code implementations • ComputEL 2021 • Oliver Adams, Benjamin Galliot, Guillaume Wisniewski, Nicholas Lambourne, Ben Foley, Rahasya Sanders-Dwyer, Janet Wiles, Alexis Michaud, Séverine Guillaume, Laurent Besacier, Christopher Cox, Katya Aplonova, Guillaume Jacques, Nathan Hill

This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Analyse d'erreurs de transcriptions phon\'emiques automatiques d'une langue « rare » : le na (mosuo) (Analyzing errors in automatic phonemic transcriptions of the Na (Mosuo) language (SinoTibetan family) Automatic phonemic transcription tools now reach high levels of accuracy on a single speaker with relatively small amounts of training data: on the order two to three hours of transcribed speech)

no code implementations • JEPTALNRECITAL 2020 • Alexis Michaud, Oliver Adams, S{\'e}verine Guillaume, Guillaume Wisniewski

Les syst{\`e}mes de reconnaissance automatique de la parole atteignent d{\'e}sormais des degr{\'e}s de pr{\'e}cision {\'e}lev{\'e}s sur la base d{'}un corpus d{'}entra{\^\i}nement limit{\'e} {\`a} deux ou trois heures d{'}enregistrements transcrits (pour un syst{\`e}me mono-locuteur).

Paper
Add Code

Phonemic Transcription of Low-Resource Languages: To What Extent can Preprocessing be Automated?

no code implementations • LREC 2020 • Guillaume Wisniewski, S{\'e}verine Guillaume, Alexis Michaud

What is at stake is the accessibility of language archive data for a range of NLP tasks and beyond.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Phonetic Normalization for Machine Translation of User Generated Content

no code implementations • WS 2019 • Jos{\'e} Carlos Rosales N{\'u}{\~n}ez, Djam{\'e} Seddah, Guillaume Wisniewski

We present an approach to correct noisy User Generated Content (UGC) in French aiming to produce a pretreatement pipeline to improve Machine Translation for this kind of non-canonical corpora.

Language Modelling Machine Translation +1

Paper
Add Code

Combien d'exemples de tests sont-ils n\'ecessaires \`a une \'evaluation fiable ? Quelques observations sur l'\'evaluation de l'analyse morphosyntaxique du fran\ccais. (Some observations on the evaluation of PoS taggers)

no code implementations • JEPTALNRECITAL 2019 • Guillaume Wisniewski

L{'}objectif de ce travail est de pr{\'e}senter plusieurs observations, sur l{'}{\'e}valuation des analyseurs morphosyntaxique en fran{\c{c}}ais, visant {\`a} remettre en cause le cadre habituel de l{'}apprentissage statistique dans lequel les ensembles de test et d{'}apprentissage sont fix{\'e}s arbitrairement et ind{\'e}pendemment du mod{\`e}le consid{\'e}r{\'e}.

POS SENTER

Paper
Add Code

How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project.

no code implementations • NAACL 2019 • Guillaume Wisniewski, Fran{\c{c}}ois Yvon

The performance of Part-of-Speech tagging varies significantly across the treebanks of the Universal Dependencies project.

Part-Of-Speech Tagging POS

Paper
Add Code

Quantifying training challenges of dependency parsers

no code implementations • COLING 2018 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Not all dependencies are equal when training a dependency parser: some are straightforward enough to be learned with only a sample of data, others embed more complexity.

Cross-Lingual Transfer Dependency Parsing

Paper
Add Code

Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation

no code implementations • NAACL 2018 • Marianna Apidianaki, Guillaume Wisniewski, Anne Cocos, Chris Callison-Burch

We propose a variant of a well-known machine translation (MT) evaluation metric, HyTER (Dreyer and Marcu, 2012), which exploits reference translations enriched with meaning equivalent expressions.

Machine Translation Translation

Paper
Add Code

Exploiting Dynamic Oracles to Train Projective Dependency Parsers on Non-Projective Trees

no code implementations • NAACL 2018 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Because the most common transition systems are projective, training a transition-based dependency parser often implies to either ignore or rewrite the non-projective training examples, which has an adverse impact on accuracy.

Dependency Parsing

Paper
Add Code

Automatically Selecting the Best Dependency Annotation Design with Dynamic Oracles

no code implementations • NAACL 2018 • Guillaume Wisniewski, Oph{\'e}lie Lacroix, Fran{\c{c}}ois Yvon

This work introduces a new strategy to compare the numerous conventions that have been proposed over the years for expressing dependency structures and discover the one for which a parser will achieve the highest parsing performance.

Sentence

Paper
Add Code

Analyse morpho-syntaxique en pr\'esence d'alternance codique (PoS tagging of Code Switching)

no code implementations • JEPTALNRECITAL 2018 • Jos{\'e} Carlos Rosales N{\'u}{\~n}ez, Guillaume Wisniewski

L{'}alternance codique est le ph{\'e}nom{\`e}ne qui consiste {\`a} alterner les langues au cours d{'}une m{\^e}me conversation ou d{'}une m{\^e}me phrase.

POS POS Tagging

Paper
Add Code

Errator: a Tool to Help Detect Annotation Errors in the Universal Dependencies Project

no code implementations • LREC 2018 • Guillaume Wisniewski

Cross-Lingual Transfer Machine Translation

Paper
Add Code

Divergences entre annotations dans le projet Universal Dependencies et leur impact sur l'\'evaluation des performance d'\'etiquetage morpho-syntaxique (Evaluating Annotation Divergences in the UD Project)

no code implementations • JEPTALNRECITAL 2018 • Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Ce travail montre que la d{\'e}gradation des performances souvent observ{\'e}e lors de l{'}application d{'}un analyseur morpho-syntaxique {\`a} des donn{\'e}es hors domaine r{\'e}sulte souvent d{'}incoh{\'e}rences entre les annotations des ensembles de test et d{'}apprentissage.

Paper
Add Code

LIMSI Submission for WMT'17 Shared Task on Bandit Learning

no code implementations • WS 2017 • Guillaume Wisniewski

Machine Translation

Paper
Add Code

LIMSI@CoNLL'17: UD Shared Task

no code implementations • CONLL 2017 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

This paper describes LIMSI{'}s submission to the CoNLL 2017 UD Shared Task, which is focused on small treebanks, and how to improve low-resourced parsing only by ad hoc combination of multiple views and resources.

Model Selection

Paper
Add Code

Adaptation au domaine pour l'analyse morpho-syntaxique (Domain Adaptation for PoS tagging)

no code implementations • JEPTALNRECITAL 2017 • {\'E}l{\'e}onor Bartenlian, Margot Lacour, Matthieu Labeau, Alex Allauzen, re, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Ce travail cherche {\`a} comprendre pourquoi les performances d{'}un analyseur morpho-syntaxiques chutent fortement lorsque celui-ci est utilis{\'e} sur des donn{\'e}es hors domaine.

Domain Adaptation POS +1

Paper
Add Code

A Systematic Comparison of Syntactic Representations of Dependency Parsing

no code implementations • WS 2017 • Guillaume Wisniewski, Oph{\'e}lie Lacroix

Dependency Parsing

Paper
Add Code

Don't Stop Me Now! Using Global Dynamic Oracles to Correct Training Biases of Transition-Based Dependency Parsers

no code implementations • EACL 2017 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

This paper formalizes a sound extension of dynamic oracles to global training, in the frame of transition-based dependency parsers.

Active Learning Dependency Parsing

Paper
Add Code

Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge

no code implementations • COLING 2016 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

This paper studies cross-lingual transfer for dependency parsing, focusing on very low-resource settings where delexicalized transfer is the only fully automatic option.

Active Learning Cross-Lingual Transfer +3

Paper
Add Code

LIMSI@WMT'16: Machine Translation of News

no code implementations • WS 2016 • Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Oph{\'e}lie Lacroix, Elena Knyazeva, Thomas Lavergne, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Machine Translation Translation

Paper
Add Code

Apprentissage d'analyseur en d\'ependances cross-lingue par projection partielle de d\'ependances (Cross-lingual learning of dependency parsers from partially projected dependencies )

no code implementations • JEPTALNRECITAL 2016 • Oph{\'e}lie Lacroix, Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Cet article pr{\'e}sente une m{\'e}thode simple de transfert cross-lingue de d{\'e}pendances.

Paper
Add Code

Ne nous arr\^etons pas en si bon chemin : am\'eliorations de l'apprentissage global d'analyseurs en d\'ependances par transition (Don't Stop Me Now ! Improved Update Strategies for Global Training of Transition-Based)

no code implementations • JEPTALNRECITAL 2016 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Dans cet article, nous proposons trois am{\'e}liorations simples pour l{'}apprentissage global d{'}analyseurs en d{\'e}pendances par transition de type A RC E AGER : un oracle non d{\'e}terministe, la reprise sur le m{\^e}me exemple apr{\`e}s une mise {\`a} jour et l{'}entra{\^\i}nement en configurations sous-optimales.

Paper
Add Code

Investigating gender adaptation for speech translation

no code implementations • JEPTALNRECITAL 2016 • Rachel Bawden, Guillaume Wisniewski, H{\'e}l{\`e}ne Maynard

In this paper we investigate the impact of the integration of context into dialogue translation.

Machine Translation Translation

Paper
Add Code

Cross-lingual alignment transfer: a chicken-and-egg story?

no code implementations • WS 2016 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Cross-Lingual Transfer Dependency Parsing +3

Paper
Add Code

Cross-lingual Dependency Transfer : What Matters? Assessing the Impact of Pre- and Post-processing

no code implementations • WS 2016 • Oph{\'e}lie Lacroix, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Paper
Add Code

Frustratingly Easy Cross-Lingual Transfer for Transition-Based Dependency Parsing

no code implementations • NAACL 2016 • Oph{\'e}lie Lacroix, Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Cross-Lingual Transfer Transition-Based Dependency Parsing

Paper
Add Code

Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian

no code implementations • LREC 2016 • Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Because of the small size of Romanian corpora, the performance of a PoS tagger or a dependency parser trained with the standard supervised methods fall far short from the performance achieved in most languages.

Cross-Lingual Transfer POS

Paper
Add Code

Why Predicting Post-Edition is so Hard? Failure Analysis of LIMSI Submission to the APE Shared Task

no code implementations • WS 2015 • Guillaume Wisniewski, Nicolas P{\'e}cheux, Fran{\c{c}}ois Yvon

Automatic Post-Editing

Paper
Add Code

Oublier ce qu'on sait, pour mieux apprendre ce qu'on ne sait pas : une \'etude sur les contraintes de type dans les mod\`eles CRF

no code implementations • JEPTALNRECITAL 2015 • Nicolas P{\'e}cheux, Alex Allauzen, re, Thomas Lavergne, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Quand on dispose de connaissances a priori sur les sorties possibles d{'}un probl{\`e}me d{'}{\'e}tiquetage, il semble souhaitable d{'}inclure cette information lors de l{'}apprentissage pour simplifier la t{\^a}che de mod{\'e}lisation et acc{\'e}l{\'e}rer les traitements.

Paper
Add Code

Apprentissage par imitation pour l'\'etiquetage de s\'equences : vers une formalisation des m\'ethodes d'\'etiquetage easy-first

no code implementations • JEPTALNRECITAL 2015 • Elena Knyazeva, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Gr{\^a}ce au lien que nous faisons entre apprentissage structur{\'e} et apprentissage par renforcement, nous sommes en mesure de proposer une m{\'e}thode th{\'e}oriquement bien justifi{\'e}e pour apprendre des m{\'e}thodes d{'}inf{\'e}rence approch{\'e}e. Les exp{\'e}riences que nous r{\'e}alisons sur quatre t{\^a}ches de TAL valident l{'}approche propos{\'e}e.

Paper
Add Code

Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning

no code implementations • EMNLP 2014 • Guillaume Wisniewski, Nicolas P{\'e}cheux, Souhir Gahbiche-Braham, Fran{\c{c}}ois Yvon

Cross-Lingual Transfer Named Entity Recognition (NER) +2

Paper
Add Code

Cross-Lingual POS Tagging through Ambiguous Learning: First Experiments (Apprentissage partiellement supervis\'e d'un \'etiqueteur morpho-syntaxique par transfert cross-lingue) [in French]

no code implementations • JEPTALNRECITAL 2014 • Guillaume Wisniewski, Nicolas P{\'e}cheux, Elena Knyazeva, Alex Allauzen, re, Fran{\c{c}}ois Yvon

Cross-Lingual POS Tagging Cross-Lingual Transfer +3

Paper
Add Code

LIMSI Submission for WMT'14 QE Task

no code implementations • WS 2014 • Guillaume Wisniewski, Nicolas P{\'e}cheux, Alex Allauzen, er, Fran{\c{c}}ois Yvon

Machine Translation

Paper
Add Code

A Corpus of Machine Translation Errors Extracted from Translation Students Exercises

no code implementations • LREC 2014 • Guillaume Wisniewski, Natalie K{\"u}bler, Fran{\c{c}}ois Yvon

In this paper, we present a freely available corpus of automatic translations accompanied with post-edited versions, annotated with labels identifying the different kinds of errors made by the MT system.

Machine Translation Translation