no code implementations • WMT (EMNLP) 2021 • Md Mahfuz ibn Alam, Ivana Kvapilíková, Antonios Anastasopoulos, Laurent Besacier, Georgiana Dinu, Marcello Federico, Matthias Gallé, Kweonwoo Jung, Philipp Koehn, Vassilina Nikoulina
Language domains that require very careful use of terminology are abundant and reflect a significant part of the translation industry.
no code implementations • ACL (IWSLT) 2021 • Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.
no code implementations • IWSLT (EMNLP) 2018 • Luisa Bentivogli, Mauro Cettolo, Marcello Federico, Christian Federmann
In this paper we present an analysis of the two most prominent methodologies used for the human evaluation of MT quality, namely evaluation based on Post-Editing (PE) and evaluation based on Direct Assessment (DA).
no code implementations • IWSLT (ACL) 2022 • Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.
1 code implementation • IWSLT 2017 • Mattia Antonino Di Gangi, Marcello Federico
When only little data exist for a language pair, the model cannot produce good representations for words, particularly for rare words.
no code implementations • IWSLT 2017 • Surafel M. Lakew, Quintino F. Lotito, Marco Turchi, Matteo Negri, Marcello Federico
Particularly, we focus on the four zero-shot directions and show how a multilingual model trained with small data can provide reasonable results.
no code implementations • IWSLT 2017 • Mauro Cettolo, Marcello Federico, Luisa Bentivogli, Jan Niehues, Sebastian Stüker, Katsuhito Sudoh, Koichiro Yoshino, Christian Federmann
The IWSLT 2017 evaluation campaign has organised three tasks.
1 code implementation • ACL (IWSLT) 2021 • David Vilar, Marcello Federico
Sub-word segmentation is currently a standard tool for training neural machine translation (MT) systems and other NLP tasks.
no code implementations • IWSLT (EMNLP) 2018 • Surafel M. Lakew, Marcello Federico
In the experimental setting, an extremely low-resourced Basque-English language pair (i. e., ≈ 5. 6K in-domain training data) is our target translation task, where we considered a closely related French/Spanish-English parallel data to build the multilingual model.
no code implementations • IWSLT (EMNLP) 2018 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Mauro Cettolo, Marco Turchi, Marcello Federico
The International Workshop of Spoken Language Translation (IWSLT) 2018 Evaluation Campaign featured two tasks: low-resource machine translation and speech translation.
no code implementations • IWSLT 2016 • M. Amin Farajian, Rajen Chatterjee, Costanza Conforti, Shahab Jalalvand, Vevake Balaraman, Mattia A. Di Gangi, Duygu Ataman, Marco Turchi, Matteo Negri, Marcello Federico
They leverage linguistic information such as lemmas and part-of-speech tags of the source words in the form of additional factors along with the words.
no code implementations • IWSLT 2016 • Mauro Cettolo, Jan Niehues, Sebastian Stüker, Luisa Bentivogli, Rolando Cattoni, Marcello Federico
The IWSLT 2016 Evaluation Campaign featured two tasks: the translation of talks and the translation of video conference conversations.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • EMNLP (IWSLT) 2019 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.
1 code implementation • 10 Apr 2024 • Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu J. Han
Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks.
1 code implementation • 11 Jan 2024 • Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico
We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT).
1 code implementation • 1 Nov 2023 • Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico
Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.
no code implementations • 4 Aug 2023 • Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico
The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.
no code implementations • 22 May 2023 • Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico
To translate speech for automatic dubbing, machine translation needs to be isochronous, i. e. translated speech needs to be aligned with the source in terms of speech durations.
1 code implementation • 25 Feb 2023 • Alexandra Chronopoulou, Brian Thompson, Prashant Mathur, Yogesh Virkar, Surafel M. Lakew, Marcello Federico
Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech.
no code implementations • 19 Oct 2022 • Suvodeep Majumder, Stanislas Lauly, Maria Nadejde, Marcello Federico, Georgiana Dinu
This paper addresses the task of contextual translation using multi-segment models.
no code implementations • 11 Oct 2022 • Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico
Several recent studies have reported dramatic performance improvements in neural machine translation (NMT) by augmenting translation at inference time with fuzzy-matches retrieved from a translation memory (TM).
no code implementations • 10 Oct 2022 • Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico
We explore zero-shot adaptation, where a general-domain model has access to customer or domain specific parallel data at inference time, but not during training.
1 code implementation • 27 Sep 2022 • Giorgos Vernikos, Brian Thompson, Prashant Mathur, Marcello Federico
Our experimental results support our initial hypothesis and show that a simple extension of the metrics permits them to take advantage of context to resolve ambiguities in the reference.
2 code implementations • 12 Jul 2022 • Felix Hieber, Michael Denkowski, Tobias Domhan, Barbara Darques Barros, Celina Dong Ye, Xing Niu, Cuong Hoang, Ke Tran, Benjamin Hsu, Maria Nadejde, Surafel Lakew, Prashant Mathur, Anna Currey, Marcello Federico
When running comparable models, Sockeye 3 is up to 126% faster than other PyTorch implementations on GPUs and up to 292% faster on CPUs.
2 code implementations • Findings (NAACL) 2022 • Maria Nădejde, Anna Currey, Benjamin Hsu, Xing Niu, Marcello Federico, Georgiana Dinu
However, in many cases, multiple different translations are valid and the appropriate translation may depend on the intended target audience, characteristics of the speaker, or even the relationship between speakers.
no code implementations • 6 Apr 2022 • Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote
The goal of automatic dubbing is to perform speech-to-speech translation while achieving audiovisual coherence.
no code implementations • 16 Dec 2021 • Derek Tam, Surafel M. Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico
We introduce the task of isochrony-aware machine translation which aims at generating translations suitable for dubbing.
no code implementations • 16 Dec 2021 • Surafel M. Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico
Automatic dubbing (AD) is among the machine translation (MT) use cases where translations should match a given length to allow for synchronicity between source and target speech.
no code implementations • 8 Oct 2021 • Surafel M. Lakew, Marcello Federico, Yue Wang, Cuong Hoang, Yogesh Virkar, Roberto Barra-Chicote, Robert Enyedi
Automatic dubbing aims at seamlessly replacing the speech in a video document with synthetic speech in a different language.
no code implementations • NAACL 2021 • Yue Wang, Cuong Hoang, Marcello Federico
We show that our style-augmented translation models are able to capture the style variations of translators and to generate translations with different styles on new data.
no code implementations • EMNLP (NLP-COVID19) 2020 • Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur
Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.
no code implementations • WS 2020 • Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ond{\v{r}}ej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian St{\"u}ker, Marco Turchi, Alex Waibel, er, Changhan Wang
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation.
no code implementations • WS 2020 • Georgiana Dinu, Prashant Mathur, Marcello Federico, Stanislas Lauly, Yaser Al-Onaizan
A variety of natural language tasks require processing of textual data which contains a mix of natural language and formal languages such as mathematical expressions.
no code implementations • WS 2020 • Marcello Federico, Robert Enyedi, Roberto Barra-Chicote, Ritwik Giri, Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf
We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing.
1 code implementation • EMNLP (IWSLT) 2019 • Surafel M. Lakew, Alina Karakanta, Marcello Federico, Matteo Negri, Marco Turchi
In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance.
no code implementations • EMNLP (IWSLT) 2019 • Surafel Melaku Lakew, Mattia Di Gangi, Marcello Federico
The recent advances introduced by neural machine translation (NMT) are rapidly expanding the application fields of machine translation, as well as reshaping the quality level to be targeted.
no code implementations • EMNLP (IWSLT) 2019 • Mattia Antonino Di Gangi, Robert Enyedi, Alessandra Brusadin, Marcello Federico
Our experimental results on a public speech translation data set show that adapting a model on a significant amount of parallel data including ASR transcripts is beneficial with test data of the same type, but produces a small degradation when translating clean text.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • WS 2019 • Duygu Ataman, Orhan Firat, Mattia A. Di Gangi, Marcello Federico, Alexandra Birch
Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality.
1 code implementation • 16 Sep 2019 • Surafel M. Lakew, Marcello Federico, Matteo Negri, Marco Turchi
In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT).
1 code implementation • ACL 2019 • Georgiana Dinu, Prashant Mathur, Marcello Federico, Yaser Al-Onaizan
This paper proposes a novel method to inject custom terminology into neural machine translation at run time.
no code implementations • 24 Apr 2019 • Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico
Machine translation systems are conventionally trained on textual resources that do not model phenomena that occur in spoken language.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • 24 Apr 2019 • Nicholas Ruiz, Marcello Federico
We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information.
1 code implementation • IWSLT 2017 • Surafel M. Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, Marcello Federico
Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zeroshot) translation directions not observed at training time.
2 code implementations • IWSLT (EMNLP) 2018 • Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, Marco Turchi
Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i. e., introducing new vocabulary items if they are not included in the initial model).
no code implementations • WS 2018 • Surafel M. Lakew, Aliia Erofeeva, Marcello Federico
Both research and commercial machine translation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among language varieties.
no code implementations • COLING 2018 • Surafel M. Lakew, Mauro Cettolo, Marcello Federico
Motivated by this, our work (i) provides a quantitative and comparative analysis of the translations produced by bilingual, multilingual and zero-shot systems; (ii) investigates the translation quality of two of the currently dominant neural architectures in MT, which are the Recurrent and the Transformer ones; and (iii) quantitatively explores how the closeness between languages influences the zero-shot translation.
1 code implementation • 10 May 2018 • Mattia Antonino Di Gangi, Marcello Federico
Recurrent neural networks (RNNs) have represented for years the state of the art in neural machine translation.
no code implementations • ACL 2018 • Duygu Ataman, Marcello Federico
By training NMT to compose word representations from character n-grams, our approach consistently outperforms (from 1. 71 to 2. 48 BLEU points) NMT learning embeddings of statistically generated sub-word units.
no code implementations • 31 Jul 2017 • Duygu Ataman, Matteo Negri, Marco Turchi, Marcello Federico
In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language.
no code implementations • EACL 2017 • M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, Marcello Federico
State-of-the-art neural machine translation (NMT) systems are generally trained on specific domains by carefully selecting the training sets and applying proper domain adaptation techniques.
no code implementations • EMNLP 2016 • Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, Marcello Federico
Within the field of Statistical Machine Translation (SMT), the neural approach (NMT) has recently emerged as the first technology able to challenge the long-standing dominance of phrase-based approaches (PBMT).
no code implementations • LREC 2016 • Luisa Bentivogli, Mauro Cettolo, M. Amin Farajian, Marcello Federico
This paper presents WAGS (Word Alignment Gold Standard), a novel benchmark which allows extensive evaluation of WA tools on out-of-vocabulary (OOV) and rare words.
no code implementations • 17 Feb 2015 • Arianna Bisazza, Marcello Federico
Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency.
no code implementations • COLING 2014 • Marcello Federico, Nicola Bertoldi, Mauro Cettolo, Matteo Negri, Marco Turchi, Marco Trombetti, Aless Cattelan, ro, Antonio Farina, Domenico Lupinetti, Andrea Martines, Alberto Massidda, Holger Schwenk, Lo{\"\i}c Barrault, Frederic Blain, Philipp Koehn, Christian Buck, Ulrich Germann
no code implementations • TACL 2013 • Arianna Bisazza, Marcello Federico
Defining the reordering search space is a crucial issue in phrase-based SMT between distant languages.
no code implementations • LREC 2012 • Marcello Federico, Sebastian St{\"u}ker, Luisa Bentivogli, Michael Paul, Mauro Cettolo, Teresa Herrmann, Jan Niehues, Giovanni Moretti
We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series.