Search Results for author: Marcello Federico

Found 74 papers, 17 papers with code

FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN

no code implementations ACL (IWSLT) 2021 Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.

Translation

The IWSLT 2018 Evaluation Campaign

no code implementations IWSLT (EMNLP) 2018 Jan Niehues, Rolando Cattoni, Sebastian Stüker, Mauro Cettolo, Marco Turchi, Marcello Federico

The International Workshop of Spoken Language Translation (IWSLT) 2018 Evaluation Campaign featured two tasks: low-resource machine translation and speech translation.

Machine Translation Translation

A Statistical Extension of Byte-Pair Encoding

1 code implementation ACL (IWSLT) 2021 David Vilar, Marcello Federico

Sub-word segmentation is currently a standard tool for training neural machine translation (MT) systems and other NLP tasks.

Data Compression Machine Translation +1

Adapting Multilingual NMT to Extremely Low Resource Languages FBK’s Participation in the Basque-English Low-Resource MT Task, IWSLT 2018

no code implementations IWSLT (EMNLP) 2018 Surafel M. Lakew, Marcello Federico

In the experimental setting, an extremely low-resourced Basque-English language pair (i. e., ≈ 5. 6K in-domain training data) is our target translation task, where we considered a closely related French/Spanish-English parallel data to build the multilingual model.

Machine Translation NMT +2

Machine Translation Human Evaluation: an investigation of evaluation based on Post-Editing and its relation with Direct Assessment

no code implementations IWSLT (EMNLP) 2018 Luisa Bentivogli, Mauro Cettolo, Marcello Federico, Christian Federmann

In this paper we present an analysis of the two most prominent methodologies used for the human evaluation of MT quality, namely evaluation based on Post-Editing (PE) and evaluation based on Direct Assessment (DA).

Machine Translation

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations IWSLT (ACL) 2022 Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

The IWSLT 2019 Evaluation Campaign

no code implementations EMNLP (IWSLT) 2019 Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico

The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.

Translation

Monolingual Embeddings for Low Resourced Neural Machine Translation

1 code implementation IWSLT 2017 Mattia Antonino Di Gangi, Marcello Federico

When only little data exist for a language pair, the model cannot produce good representations for words, particularly for rare words.

Machine Translation NMT +2

FBK’s Multilingual Neural Machine Translation System for IWSLT 2017

no code implementations IWSLT 2017 Surafel M. Lakew, Quintino F. Lotito, Marco Turchi, Matteo Negri, Marcello Federico

Particularly, we focus on the four zero-shot directions and show how a multilingual model trained with small data can provide reasonable results.

Machine Translation Transfer Learning +1

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

1 code implementation11 Jan 2024 Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico

We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT).

Machine Translation Selection bias

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

1 code implementation1 Nov 2023 Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.

Automatic Speech Recognition speech-recognition +3

Speaker Diarization of Scripted Audiovisual Content

no code implementations4 Aug 2023 Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.

speaker-diarization Speaker Diarization +2

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

no code implementations22 May 2023 Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico

To translate speech for automatic dubbing, machine translation needs to be isochronous, i. e. translated speech needs to be aligned with the source in terms of speech durations.

Machine Translation Translation

Improving Robustness of Retrieval Augmented Translation via Shuffling of Suggestions

no code implementations11 Oct 2022 Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico

Several recent studies have reported dramatic performance improvements in neural machine translation (NMT) by augmenting translation at inference time with fuzzy-matches retrieved from a translation memory (TM).

Machine Translation NMT +2

Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions

no code implementations10 Oct 2022 Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico

We explore zero-shot adaptation, where a general-domain model has access to customer or domain specific parallel data at inference time, but not during training.

Machine Translation Retrieval +2

Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric

1 code implementation27 Sep 2022 Giorgos Vernikos, Brian Thompson, Prashant Mathur, Marcello Federico

Our experimental results support our initial hypothesis and show that a simple extension of the metrics permits them to take advantage of context to resolve ambiguities in the reference.

Machine Translation Sentence

CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality

2 code implementations Findings (NAACL) 2022 Maria Nădejde, Anna Currey, Benjamin Hsu, Xing Niu, Marcello Federico, Georgiana Dinu

However, in many cases, multiple different translations are valid and the appropriate translation may depend on the intended target audience, characteristics of the speaker, or even the relationship between speakers.

Machine Translation Sentence +2

Prosodic Alignment for off-screen automatic dubbing

no code implementations6 Apr 2022 Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote

The goal of automatic dubbing is to perform speech-to-speech translation while achieving audiovisual coherence.

Speech-to-Speech Translation Translation

Isochrony-Aware Neural Machine Translation for Automatic Dubbing

no code implementations16 Dec 2021 Derek Tam, Surafel M. Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico

We introduce the task of isochrony-aware machine translation which aims at generating translations suitable for dubbing.

Machine Translation Sentence +1

Isometric MT: Neural Machine Translation for Automatic Dubbing

no code implementations16 Dec 2021 Surafel M. Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico

Automatic dubbing (AD) is among the machine translation (MT) use cases where translations should match a given length to allow for synchronicity between source and target speech.

Machine Translation Re-Ranking +2

Machine Translation Verbosity Control for Automatic Dubbing

no code implementations8 Oct 2021 Surafel M. Lakew, Marcello Federico, Yue Wang, Cuong Hoang, Yogesh Virkar, Roberto Barra-Chicote, Robert Enyedi

Automatic dubbing aims at seamlessly replacing the speech in a video document with synthetic speech in a different language.

Machine Translation Translation

Towards Modeling the Style of Translators in Neural Machine Translation

no code implementations NAACL 2021 Yue Wang, Cuong Hoang, Marcello Federico

We show that our style-augmented translation models are able to capture the style variations of translators and to generate translations with different styles on new data.

Machine Translation Translation

FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN

no code implementations WS 2020 Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ond{\v{r}}ej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian St{\"u}ker, Marco Turchi, Alex Waibel, er, Changhan Wang

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation.

Translation

Joint translation and unit conversion for end-to-end localization

no code implementations WS 2020 Georgiana Dinu, Prashant Mathur, Marcello Federico, Stanislas Lauly, Yaser Al-Onaizan

A variety of natural language tasks require processing of textual data which contains a mix of natural language and formal languages such as mathematical expressions.

Data Augmentation Translation

Controlling the Output Length of Neural Machine Translation

no code implementations EMNLP (IWSLT) 2019 Surafel Melaku Lakew, Mattia Di Gangi, Marcello Federico

The recent advances introduced by neural machine translation (NMT) are rapidly expanding the application fields of machine translation, as well as reshaping the quality level to be targeted.

Machine Translation NMT +1

Robust Neural Machine Translation for Clean and Noisy Speech Transcripts

no code implementations EMNLP (IWSLT) 2019 Mattia Antonino Di Gangi, Robert Enyedi, Alessandra Brusadin, Marcello Federico

Our experimental results on a public speech translation data set show that adapting a model on a significant amount of parallel data including ASR transcripts is beneficial with test data of the same type, but produces a small degradation when translating clean text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

On the Importance of Word Boundaries in Character-level Neural Machine Translation

1 code implementation WS 2019 Duygu Ataman, Orhan Firat, Mattia A. Di Gangi, Marcello Federico, Alexandra Birch

Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality.

Machine Translation NMT +1

Multilingual Neural Machine Translation for Zero-Resource Languages

1 code implementation16 Sep 2019 Surafel M. Lakew, Marcello Federico, Matteo Negri, Marco Turchi

In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT).

Machine Translation NMT +1

Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation

1 code implementation24 Apr 2019 Nicholas Ruiz, Marcello Federico

We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information.

speech-recognition Speech Recognition +2

Improving Zero-Shot Translation of Low-Resource Languages

1 code implementation IWSLT 2017 Surafel M. Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, Marcello Federico

Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zeroshot) translation directions not observed at training time.

Machine Translation Translation

Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

2 code implementations IWSLT (EMNLP) 2018 Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, Marco Turchi

Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i. e., introducing new vocabulary items if they are not included in the initial model).

Machine Translation NMT +2

Neural Machine Translation into Language Varieties

no code implementations WS 2018 Surafel M. Lakew, Aliia Erofeeva, Marcello Federico

Both research and commercial machine translation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among language varieties.

Machine Translation Translation

A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation

no code implementations COLING 2018 Surafel M. Lakew, Mauro Cettolo, Marcello Federico

Motivated by this, our work (i) provides a quantitative and comparative analysis of the translations produced by bilingual, multilingual and zero-shot systems; (ii) investigates the translation quality of two of the currently dominant neural architectures in MT, which are the Recurrent and the Transformer ones; and (iii) quantitatively explores how the closeness between languages influences the zero-shot translation.

Machine Translation NMT +2

Deep Neural Machine Translation with Weakly-Recurrent Units

1 code implementation10 May 2018 Mattia Antonino Di Gangi, Marcello Federico

Recurrent neural networks (RNNs) have represented for years the state of the art in neural machine translation.

Machine Translation NMT +2

Compositional Representation of Morphologically-Rich Input for Neural Machine Translation

no code implementations ACL 2018 Duygu Ataman, Marcello Federico

By training NMT to compose word representations from character n-grams, our approach consistently outperforms (from 1. 71 to 2. 48 BLEU points) NMT learning embeddings of statistically generated sub-word units.

Machine Translation NMT +1

Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

no code implementations31 Jul 2017 Duygu Ataman, Matteo Negri, Marco Turchi, Marcello Federico

In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language.

Machine Translation Morphological Analysis +2

Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario

no code implementations EACL 2017 M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, Marcello Federico

State-of-the-art neural machine translation (NMT) systems are generally trained on specific domains by carefully selecting the training sets and applying proper domain adaptation techniques.

Domain Adaptation Machine Translation +2

Neural versus Phrase-Based Machine Translation Quality: a Case Study

no code implementations EMNLP 2016 Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, Marcello Federico

Within the field of Statistical Machine Translation (SMT), the neural approach (NMT) has recently emerged as the first technology able to challenge the long-standing dominance of phrase-based approaches (PBMT).

Machine Translation NMT +1

WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words

no code implementations LREC 2016 Luisa Bentivogli, Mauro Cettolo, M. Amin Farajian, Marcello Federico

This paper presents WAGS (Word Alignment Gold Standard), a novel benchmark which allows extensive evaluation of WA tools on out-of-vocabulary (OOV) and rare words.

Sentence Word Alignment

A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

no code implementations17 Feb 2015 Arianna Bisazza, Marcello Federico

Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.