no code implementations • JEP/TALN/RECITAL 2022 • Lichao Zhu, Guillaume Wisniewski, Nicolas Ballier, François Yvon
Ce travail présente deux séries d’expériences visant à identifier les flux d’information dans les systèmes de traduction neuronaux.
no code implementations • JEP/TALN/RECITAL 2022 • Nicolas Devatine, Caio Corro, François Yvon
Cet article s’intéresse au transfert cross-lingue d’analyseurs en dépendances et étudie des méthodes pour limiter l’effet potentiellement néfaste pour le transfert de divergences entre l’ordre des mots dans les langues source et cible.
no code implementations • JEP/TALN/RECITAL 2022 • Shu Okabe, François Yvon
La segmentation automatique en mots et en morphèmes est une étape cruciale dans le processus de documentation des langues.
no code implementations • EAMT 2022 • Minh-Quang Pham, Josep Crego, François Yvon
In this paper, we study dynamic data selection strategies that are able to automatically re-evaluate the usefulness of data samples and to evolve a data selection policy in the course of training.
1 code implementation • ACL 2022 • Shu Okabe, Laurent Besacier, François Yvon
Word and morpheme segmentation are fundamental steps of language documentation as they allow to discover lexical units in a language for which the lexicon is unknown.
no code implementations • IWSLT 2016 • Franck Burlot, Matthieu Labeau, Elena Knyazeva, Thomas Lavergne, Alexandre Allauzen, François Yvon
This paper describes LIMSI’s submission to the MT track of IWSLT 2016.
no code implementations • IWSLT 2016 • Franck Burlot, Elena Knyazeva, Thomas Lavergne, François Yvon
This paper describes a two-step machine translation system that addresses the issue of translating into a morphologically rich language (English to Czech), by performing separately the translation and the generation of target morphology.
no code implementations • EMNLP (IWSLT) 2019 • MinhQuang Pham, Josep Crego, François Yvon, Jean Senellart
Supervised machine translation works well when the train and test data are sampled from the same distribution.
no code implementations • WMT (EMNLP) 2021 • Jitao Xu, Minh Quang Pham, Sadaf Abdul Rauf, François Yvon
This paper describes LISN’s submissions to two shared tasks at WMT’21.
no code implementations • WMT (EMNLP) 2020 • Sadaf Abdul Rauf, José Carlos Rosales Núñez, Minh Quang Pham, François Yvon
This paper describes LIMSI’s submissions to the translation shared tasks at WMT’20.
1 code implementation • WMT (EMNLP) 2020 • Minh Quang Pham, Josep Maria Crego, François Yvon, Jean Senellart
Domain adaptation is an old and vexing problem for machine translation systems.
no code implementations • WMT (EMNLP) 2020 • Minh Quang Pham, Jitao Xu, Josep Crego, François Yvon, Jean Senellart
Priming is a well known and studied psychology phenomenon based on the prior presentation of one stimulus (cue) to influence the processing of a response.
no code implementations • MTSummit 2021 • Anh Khoa Ngo Ho, François Yvon
Word alignment identify translational correspondences between words in a parallel sentence pair and are used and for example and to train statistical machine translation and learn bilingual dictionaries or to perform quality estimation.
no code implementations • JEP/TALN/RECITAL 2021 • François Buet, François Yvon
Une façon de réaliser un sous-titrage automatique monolingue est d’associer un système de reconnaissance de parole avec un modèle de traduction de la transcription vers les sous-titres.
no code implementations • JEP/TALN/RECITAL 2021 • Guillaume Wisniewski, Lichao Zhou, Nicolas Ballier, François Yvon
Cet article présente les premiers résultats d’une étude en cours sur les biais de genre dans les corpus d’entraînements et dans les systèmes de traduction neuronale.
1 code implementation • 19 May 2022 • Alina Karakanta, François Buet, Mauro Cettolo, François Yvon
Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference.
1 code implementation • IWSLT (ACL) 2022 • Jitao Xu, François Buet, Josep Crego, Elise Bertin-Lemée, François Yvon
As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs.
no code implementations • Findings (ACL) 2022 • Ayyoob Imani, Lütfi Kerem Şenel, Masoud Jalili Sabet, François Yvon, Hinrich Schütze
First, we create a multiparallel word alignment graph, joining all bilingual word alignment pairs in one graph.
no code implementations • EMNLP (BlackboxNLP) 2021 • Guillaume Wisniewski, Lichao Zhu, Nicolas Ballier, François Yvon
This paper aims at identifying the information flow in state-of-the-art machine translation systems, taking as example the transfer of gender when translating from French into English.
1 code implementation • EMNLP 2021 • Jitao Xu, François Yvon
Machine translation is generally understood as generating one target text from an input source document.
1 code implementation • EMNLP 2021 • Ayyoob Imani, Masoud Jalili Sabet, Lütfi Kerem Şenel, Philipp Dufter, François Yvon, Hinrich Schütze
With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently.
no code implementations • NAACL (CALCS) 2021 • Jitao Xu, François Yvon
Code-Switching (CSW) is a common phenomenon that occurs in multilingual geographic or social contexts, which raises challenging problems for natural language processing tools.
no code implementations • AMTA 2020 • Anh Khoa Ngo Ho, François Yvon
Word alignments identify translational correspondences between words in a parallel sentence pair and are used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems or to perform quality estimation.
no code implementations • EMNLP (IWSLT) 2019 • Anh Khoa Ngo Ho, François Yvon
Word alignments identify translational correspondences between words in a parallel sentence pair and is used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems , or to perform quality estimation.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Masoud Jalili Sabet, Philipp Dufter, François Yvon, Hinrich Schütze
We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners, even with abundant parallel data; e. g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.
no code implementations • LREC 2020 • Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin, Walter Daelemans, Koenraad De Smedt, Radovan Garabík, Maria Gavriilidou, Dagmar Gromann, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lindén, Bernardo Magnini, Jan Odijk, Maciej Ogrodniczuk, Eiríkur Rögnvaldsson, Mike Rosner, Bolette Sandford Pedersen, Inguna Skadiņa, Marko Tadić, Dan Tufiş, Tamás Váradi, Kadri Vider, Andy Way, François Yvon
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality.
1 code implementation • WS 2018 • Franck Burlot, François Yvon
Our findings confirm that back-translation is very effective and give new explanations as to why this is the case.
no code implementations • 18 Jun 2018 • Pierre Godard, Marcely Zanon-Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier
We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL).
no code implementations • 16 Feb 2018 • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur
Developing speech technologies for low-resource languages has become a very active research field over the last decade.