Search Results for author: Guillaume Wisniewski

Found 61 papers, 3 papers with code

Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content

no code implementations WS (NoDaLiDa) 2019 José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work compares the performances achieved by Phrase-Based Statistical Machine Translation systems (PB-SMT) and attention-based Neuronal Machine Translation systems (NMT) when translating User Generated Content (UGC), as encountered in social medias, from French to English.

Machine Translation NMT +1

Biais de genre dans un système de traduction automatiqueneuronale : une étude préliminaire (Gender Bias in Neural Translation : a preliminary study )

no code implementations JEP/TALN/RECITAL 2021 Guillaume Wisniewski, Lichao Zhou, Nicolas Ballier, François Yvon

Cet article présente les premiers résultats d’une étude en cours sur les biais de genre dans les corpus d’entraînements et dans les systèmes de traduction neuronale.

Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models

no code implementations8 Feb 2024 Maxime Fily, Guillaume Wisniewski, Severine Guillaume, Gilles Adda, Alexis Michaud

We propose a new unsupervised method using ABX tests on audio recordings with carefully curated metadata to shed light on the type of information present in the representations.

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

no code implementations24 Oct 2023 Lina Conti, Guillaume Wisniewski

Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision.

From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape

no code implementations29 May 2023 Séverine Guillaume, Guillaume Wisniewski, Alexis Michaud

We use max-pooling to aggregate the neural representations from a "snippet-lect" (the speech in a 5-second audio snippet) to a "doculect" (the speech in a given resource), then to dialects and languages.

Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement

1 code implementation8 Dec 2022 Bingzhi Li, Guillaume Wisniewski, Benoît Crabbé

The long-distance agreement, evidence for syntactic structure, is increasingly used to assess the syntactic generalization of Neural Language Models.

counterfactual Object

Is the Language Familiarity Effect gradual? A computational modelling approach

no code implementations27 Jun 2022 Maureen de Seyssel, Guillaume Wisniewski, Emmanuel Dupoux

According to the Language Familiarity Effect (LFE), people are better at discriminating between speakers of their native language.

Probing phoneme, language and speaker information in unsupervised speech representations

no code implementations30 Mar 2022 Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski

Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages.

Language Modelling

Screening Gender Transfer in Neural Machine Translation

no code implementations EMNLP (BlackboxNLP) 2021 Guillaume Wisniewski, Lichao Zhu, Nicolas Ballier, François Yvon

This paper aims at identifying the information flow in state-of-the-art machine translation systems, taking as example the transfer of gender when translating from French into English.

Machine Translation Translation

Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History

no code implementations25 Feb 2022 Aurélien Max, Guillaume Wisniewski

Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic processes on text.

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

1 code implementation WNUT (ACL) 2021 José Carlos Rosales Núñez, Guillaume Wisniewski, Djamé Seddah

This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time.

Machine Translation Translation

Understanding the Impact of UGC Specificities on Translation Quality

no code implementations WNUT (ACL) 2021 José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT.

Translation

Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement

no code implementations EMNLP 2021 Bingzhi Li, Guillaume Wisniewski, Benoit Crabbé

Many recent works have demonstrated that unsupervised sentence representations of neural networks encode syntactic information by observing that neural language models are able to predict the agreement between a verb and its subject.

Sentence

Phonetic Normalization for Machine Translation of User Generated Content

no code implementations WS 2019 Jos{\'e} Carlos Rosales N{\'u}{\~n}ez, Djam{\'e} Seddah, Guillaume Wisniewski

We present an approach to correct noisy User Generated Content (UGC) in French aiming to produce a pretreatement pipeline to improve Machine Translation for this kind of non-canonical corpora.

Language Modelling Machine Translation +1

Combien d'exemples de tests sont-ils n\'ecessaires \`a une \'evaluation fiable ? Quelques observations sur l'\'evaluation de l'analyse morphosyntaxique du fran\ccais. (Some observations on the evaluation of PoS taggers)

no code implementations JEPTALNRECITAL 2019 Guillaume Wisniewski

L{'}objectif de ce travail est de pr{\'e}senter plusieurs observations, sur l{'}{\'e}valuation des analyseurs morphosyntaxique en fran{\c{c}}ais, visant {\`a} remettre en cause le cadre habituel de l{'}apprentissage statistique dans lequel les ensembles de test et d{'}apprentissage sont fix{\'e}s arbitrairement et ind{\'e}pendemment du mod{\`e}le consid{\'e}r{\'e}.

POS SENTER

Quantifying training challenges of dependency parsers

no code implementations COLING 2018 Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Not all dependencies are equal when training a dependency parser: some are straightforward enough to be learned with only a sample of data, others embed more complexity.

Cross-Lingual Transfer Dependency Parsing

Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation

no code implementations NAACL 2018 Marianna Apidianaki, Guillaume Wisniewski, Anne Cocos, Chris Callison-Burch

We propose a variant of a well-known machine translation (MT) evaluation metric, HyTER (Dreyer and Marcu, 2012), which exploits reference translations enriched with meaning equivalent expressions.

Machine Translation Translation

Exploiting Dynamic Oracles to Train Projective Dependency Parsers on Non-Projective Trees

no code implementations NAACL 2018 Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Because the most common transition systems are projective, training a transition-based dependency parser often implies to either ignore or rewrite the non-projective training examples, which has an adverse impact on accuracy.

Dependency Parsing

Automatically Selecting the Best Dependency Annotation Design with Dynamic Oracles

no code implementations NAACL 2018 Guillaume Wisniewski, Oph{\'e}lie Lacroix, Fran{\c{c}}ois Yvon

This work introduces a new strategy to compare the numerous conventions that have been proposed over the years for expressing dependency structures and discover the one for which a parser will achieve the highest parsing performance.

Sentence

Analyse morpho-syntaxique en pr\'esence d'alternance codique (PoS tagging of Code Switching)

no code implementations JEPTALNRECITAL 2018 Jos{\'e} Carlos Rosales N{\'u}{\~n}ez, Guillaume Wisniewski

L{'}alternance codique est le ph{\'e}nom{\`e}ne qui consiste {\`a} alterner les langues au cours d{'}une m{\^e}me conversation ou d{'}une m{\^e}me phrase.

POS POS Tagging

Divergences entre annotations dans le projet Universal Dependencies et leur impact sur l'\'evaluation des performance d'\'etiquetage morpho-syntaxique (Evaluating Annotation Divergences in the UD Project)

no code implementations JEPTALNRECITAL 2018 Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Ce travail montre que la d{\'e}gradation des performances souvent observ{\'e}e lors de l{'}application d{'}un analyseur morpho-syntaxique {\`a} des donn{\'e}es hors domaine r{\'e}sulte souvent d{'}incoh{\'e}rences entre les annotations des ensembles de test et d{'}apprentissage.

LIMSI@CoNLL'17: UD Shared Task

no code implementations CONLL 2017 Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

This paper describes LIMSI{'}s submission to the CoNLL 2017 UD Shared Task, which is focused on small treebanks, and how to improve low-resourced parsing only by ad hoc combination of multiple views and resources.

Model Selection

Adaptation au domaine pour l'analyse morpho-syntaxique (Domain Adaptation for PoS tagging)

no code implementations JEPTALNRECITAL 2017 {\'E}l{\'e}onor Bartenlian, Margot Lacour, Matthieu Labeau, Alex Allauzen, re, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Ce travail cherche {\`a} comprendre pourquoi les performances d{'}un analyseur morpho-syntaxiques chutent fortement lorsque celui-ci est utilis{\'e} sur des donn{\'e}es hors domaine.

Domain Adaptation POS +1

Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge

no code implementations COLING 2016 Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

This paper studies cross-lingual transfer for dependency parsing, focusing on very low-resource settings where delexicalized transfer is the only fully automatic option.

Active Learning Cross-Lingual Transfer +3

Ne nous arr\^etons pas en si bon chemin : am\'eliorations de l'apprentissage global d'analyseurs en d\'ependances par transition (Don't Stop Me Now ! Improved Update Strategies for Global Training of Transition-Based)

no code implementations JEPTALNRECITAL 2016 Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Dans cet article, nous proposons trois am{\'e}liorations simples pour l{'}apprentissage global d{'}analyseurs en d{\'e}pendances par transition de type A RC E AGER : un oracle non d{\'e}terministe, la reprise sur le m{\^e}me exemple apr{\`e}s une mise {\`a} jour et l{'}entra{\^\i}nement en configurations sous-optimales.

Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian

no code implementations LREC 2016 Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Because of the small size of Romanian corpora, the performance of a PoS tagger or a dependency parser trained with the standard supervised methods fall far short from the performance achieved in most languages.

Cross-Lingual Transfer POS

Oublier ce qu'on sait, pour mieux apprendre ce qu'on ne sait pas : une \'etude sur les contraintes de type dans les mod\`eles CRF

no code implementations JEPTALNRECITAL 2015 Nicolas P{\'e}cheux, Alex Allauzen, re, Thomas Lavergne, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Quand on dispose de connaissances a priori sur les sorties possibles d{'}un probl{\`e}me d{'}{\'e}tiquetage, il semble souhaitable d{'}inclure cette information lors de l{'}apprentissage pour simplifier la t{\^a}che de mod{\'e}lisation et acc{\'e}l{\'e}rer les traitements.

Apprentissage par imitation pour l'\'etiquetage de s\'equences : vers une formalisation des m\'ethodes d'\'etiquetage easy-first

no code implementations JEPTALNRECITAL 2015 Elena Knyazeva, Guillaume Wisniewski, Fran{\c{c}}ois Yvon

Gr{\^a}ce au lien que nous faisons entre apprentissage structur{\'e} et apprentissage par renforcement, nous sommes en mesure de proposer une m{\'e}thode th{\'e}oriquement bien justifi{\'e}e pour apprendre des m{\'e}thodes d{'}inf{\'e}rence approch{\'e}e. Les exp{\'e}riences que nous r{\'e}alisons sur quatre t{\^a}ches de TAL valident l{'}approche propos{\'e}e.

A Corpus of Machine Translation Errors Extracted from Translation Students Exercises

no code implementations LREC 2014 Guillaume Wisniewski, Natalie K{\"u}bler, Fran{\c{c}}ois Yvon

In this paper, we present a freely available corpus of automatic translations accompanied with post-edited versions, annotated with labels identifying the different kinds of errors made by the MT system.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.