Search Results for author: Marco Turchi

Found 121 papers, 27 papers with code

Automatic Translation for Multiple NLP tasks: a Multi-task Approach to Machine-oriented NMT Adaptation

no code implementations • EAMT 2020 • Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi

We address this problem by proposing a multi-task approach to machine-oriented NMT adaptation, which is capable to serve multiple downstream tasks with a single system.

Machine Translation NMT +1

Paper
Add Code

The IWSLT 2018 Evaluation Campaign

no code implementations • IWSLT (EMNLP) 2018 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Mauro Cettolo, Marco Turchi, Marcello Federico

The International Workshop of Spoken Language Translation (IWSLT) 2018 Evaluation Campaign featured two tasks: low-resource machine translation and speech translation.

Machine Translation Translation

Paper
Add Code

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations • IWSLT (ACL) 2022 • Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

Paper
Add Code

FBK’s Multilingual Neural Machine Translation System for IWSLT 2017

no code implementations • IWSLT 2017 • Surafel M. Lakew, Quintino F. Lotito, Marco Turchi, Matteo Negri, Marcello Federico

Particularly, we focus on the four zero-shot directions and show how a multilingual model trained with small data can provide reasonable results.

Machine Translation Transfer Learning +1

Paper
Add Code

Findings of the 2021 Conference on Machine Translation (WMT21)

no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri

This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.

Machine Translation Translation

Paper
Add Code

Machine-oriented NMT Adaptation for Zero-shot NLP tasks: Comparing the Usefulness of Close and Distant Languages

no code implementations • VarDial (COLING) 2020 • Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi

In this work, we tackle the problem in a multilingual setting where a single NMT model translates from multiple languages for downstream automatic processing in the target language.

Machine Translation NMT

Paper
Add Code

FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN

no code implementations • ACL (IWSLT) 2021 • Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.

Translation

Paper
Add Code

CEF Data Marketplace: Powering a Long-term Supply of Language Data

no code implementations • EAMT 2020 • Amir Kamran, Dace Dzeguze, Jaap van der Meer, Milica Panic, Alessandro Cattelan, Daniele Patrioli, Luisa Bentivogli, Marco Turchi

We describe the CEF Data Marketplace project, which focuses on the development of a trading platform of translation data for language professionals: translators, machine translation (MT) developers, language service providers (LSPs), translation buyers and government bodies.

Machine Translation Translation

Paper
Add Code

Findings of the WMT 2020 Shared Task on Automatic Post-Editing

no code implementations • WMT (EMNLP) 2020 • Rajen Chatterjee, Markus Freitag, Matteo Negri, Marco Turchi

Due to i) the different source/domain of data compared to the past (Wikipedia vs Information Technology), ii) the different quality of the initial translations to be corrected and iii) the introduction of a new language pair (English-Chinese), this year’s results are not directly comparable with last year’s round.

Automatic Post-Editing NMT

Paper
Add Code

Zero-Shot Neural Machine Translation with Self-Learning Cycle

no code implementations • MTSummit 2021 • Surafel M. Lakew, Matteo Negri, Marco Turchi

Neural Machine Translation (NMT) approaches employing monolingual data are showing steady improvements in resource-rich conditions.

Machine Translation NMT +2

Paper
Add Code

On the Dynamics of Gender Learning in Speech Translation

no code implementations • NAACL (GeBNLP) 2022 • Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

In this work, we contribute to such a line of inquiry by exploring the emergence of gender bias in Speech Translation (ST).

Translation

Paper
Add Code

Data Augmentation for End-to-End Speech Translation: FBK@IWSLT ‘19

no code implementations • EMNLP (IWSLT) 2019 • Mattia A. Di Gangi, Matteo Negri, Viet Nhat Nguyen, Amirhossein Tebbifakhr, Marco Turchi

On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

The IWSLT 2019 Evaluation Campaign

no code implementations • EMNLP (IWSLT) 2019 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico

The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.

Translation

Paper
Add Code

Translation Quality and Productivity: A Study on Rich Morphology Languages

no code implementations • MTSummit 2017 • Lucia Specia, Kim Harris, Frédéric Blain, Aljoscha Burchardt, Viviven Macketanz, Inguna Skadin, Matteo Negri, Marco Turchi

Translation

Paper
Add Code

Instance Selection for Online Automatic Post-Editing in a multi-domain scenario

no code implementations • AMTA 2016 • Rajen Chatterjee, Mihael Arcan, Matteo Negri, Marco Turchi

In recent years, several end-to-end online translation systems have been proposed to successfully incorporate human post-editing feedback in the translation workflow.

Automatic Post-Editing Decoder +3

Paper
Add Code

Is “moby dick” a Whale or a Bird? Named Entities and Terminology in Speech Translation

no code implementations • EMNLP 2021 • Marco Gaido, Susana Rodríguez, Matteo Negri, Luisa Bentivogli, Marco Turchi

Automatic translation systems are known to struggle with rare words.

Translation

Paper
Add Code

FBK’s Neural Machine Translation Systems for IWSLT 2016

no code implementations • IWSLT 2016 • M. Amin Farajian, Rajen Chatterjee, Costanza Conforti, Shahab Jalalvand, Vevake Balaraman, Mattia A. Di Gangi, Duygu Ataman, Marco Turchi, Matteo Negri, Marcello Federico

They leverage linguistic information such as lemmas and part-of-speech tags of the source words in the form of additional factors along with the words.

Decoder Machine Translation +2

Paper
Add Code

Post-editing in Automatic Subtitling: A Subtitlers’ perspective

1 code implementation • EAMT 2022 • Alina Karakanta, Luisa Bentivogli, Mauro Cettolo, Matteo Negri, Marco Turchi

Subtitling tools are recently being adapted for post-editing by providing automatically generated subtitles, and featuring not only machine translation, but also automatic segmentation and synchronisation.

Machine Translation Translation

Paper
Code

Towards a methodology for evaluating automatic subtitling

no code implementations • EAMT 2022 • Alina Karakanta, Luisa Bentivogli, Mauro Cettolo, Matteo Negri, Marco Turchi

In response to the increasing interest towards automatic subtitling, this EAMT-funded project aimed at collecting subtitle post-editing data in a real use case scenario where professional subtitlers edit automatically generated subtitles.

Segmentation

Paper
Add Code

Extending the MuST-C Corpus for a Comparative Evaluation of Speech Translation Technology

no code implementations • EAMT 2022 • Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Matteo Negri, Marco Turchi

This project aimed at extending the test sets of the MuST-C speech translation (ST) corpus with new reference translations.

Machine Translation Translation

Paper
Add Code

AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation

2 code implementations • 19 May 2023 • Sara Papi, Marco Turchi, Matteo Negri

Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks.

Machine Translation Translation +1

Paper
Code

Attention as a Guide for Simultaneous Speech Translation

2 code implementations • 15 Dec 2022 • Sara Papi, Matteo Negri, Marco Turchi

The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation.

Decoder Language Modelling +2

Paper
Code

Joint Speech Translation and Named Entity Recognition

1 code implementation • 21 Oct 2022 • Marco Gaido, Sara Papi, Matteo Negri, Marco Turchi

Modern automatic translation systems aim at place the human at the center by providing contextual support and knowledge.

Computational Efficiency Entity Linking +4

Paper
Code

Direct Speech Translation for Automatic Subtitling

1 code implementation • 27 Sep 2022 • Sara Papi, Marco Gaido, Alina Karakanta, Mauro Cettolo, Matteo Negri, Marco Turchi

Automatic subtitling is the task of automatically translating the speech of audiovisual content into short pieces of timed text, i. e. subtitles and their corresponding timestamps.

Translation

Paper
Code

Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora

1 code implementation • 21 Sep 2022 • Sara Papi, Alina Karakanta, Matteo Negri, Marco Turchi

Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines.

Translation

Paper
Code

Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

1 code implementation • NAACL (AutoSimTrans) 2022 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest possible latency, which is normally computed in terms of Average Lagging (AL).

Translation

Paper
Code

Who Are We Talking About? Handling Person Names in Speech Translation

1 code implementation • IWSLT (ACL) 2022 • Marco Gaido, Matteo Negri, Marco Turchi

Recent work has shown that systems for speech translation (ST) -- similarly to automatic speech recognition (ASR) -- poorly handle person names.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Efficient yet Competitive Speech Translation: FBK@IWSLT2022

1 code implementation • IWSLT (ACL) 2022 • Marco Gaido, Sara Papi, Dennis Fucci, Giuseppe Fiameni, Matteo Negri, Marco Turchi

The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality.

Sentence Translation

Paper
Code

Does Simultaneous Speech Translation need Simultaneous Models?

1 code implementation • 8 Apr 2022 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

In simultaneous speech translation (SimulST), finding the best trade-off between high translation quality and low latency is a challenging task.

Translation

Paper
Code

Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation

1 code implementation • ACL 2022 • Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

Gender bias is largely recognized as a problematic phenomenon affecting language technologies, with recent studies underscoring that it might surface differently across languages.

POS Translation

Paper
Code

Visualization: the missing factor in Simultaneous Speech Translation

no code implementations • 31 Oct 2021 • Sara Papi, Matteo Negri, Marco Turchi

Simultaneous speech translation (SimulST) is the task in which output generation has to be performed on partial, incremental speech input.

Translation

Paper
Add Code

Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation

1 code implementation • 15 Sep 2021 • Marco Gaido, Susana Rodríguez, Matteo Negri, Luisa Bentivogli, Marco Turchi

Automatic translation systems are known to struggle with rare words.

Translation

Paper
Code

Speechformer: Reducing Information Loss in Direct Speech Translation

1 code implementation • EMNLP 2021 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation.

Ranked #1 on Speech-to-Text Translation on MuST-C EN->NL

Speech-to-Text Translation Translation

Paper
Code

Simultaneous Speech Translation for Live Subtitling: from Delay to Display

1 code implementation • MTSummit 2021 • Alina Karakanta, Sara Papi, Matteo Negri, Marco Turchi

Experiments on three language pairs (en$\rightarrow$it, de, fr) show that scrolling lines is the only mode achieving an acceptable reading speed while keeping delay close to a 4-second threshold.

Translation

Paper
Code

Between Flexibility and Consistency: Joint Generation of Captions and Subtitles

1 code implementation • ACL (IWSLT) 2021 • Alina Karakanta, Marco Gaido, Matteo Negri, Marco Turchi

Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i. e. captions).

Translation

Paper
Code

Dealing with training and test segmentation mismatch: FBK@IWSLT2021

no code implementations • ACL (IWSLT) 2021 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trained on the available corpora.

Action Detection Activity Detection +4

Paper
Add Code

Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?

no code implementations • ACL 2021 • Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Alberto Martinelli, Matteo Negri, Marco Turchi

Five years after the first published proofs of concept, direct approaches to speech translation (ST) are now competing with traditional cascade solutions.

Translation

Paper
Add Code

How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation

1 code implementation • Findings (ACL) 2021 • Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi

In light of this finding, we propose a combined approach that preserves BPE overall translation quality, while leveraging the higher ability of character-based segmentation to properly translate gender.

Segmentation Translation

Paper
Code

Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation

no code implementations • ICNLSP 2021 • Marco Gaido, Matteo Negri, Mauro Cettolo, Marco Turchi

The audio segmentation mismatch between training data and those seen at run-time is a major problem in direct speech translation.

Action Detection Activity Detection +2

Paper
Add Code

Gender Bias in Machine Translation

1 code implementation • 13 Apr 2021 • Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

Machine translation (MT) technology has facilitated our daily tasks by providing accessible shortcuts for gathering, elaborating and communicating information.

Machine Translation Translation

Paper
Code

Tutorial Proposal: End-to-End Speech Translation

no code implementations • EACL 2021 • Jan Niehues, Elizabeth Salesky, Marco Turchi, Matteo Negri

Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Self-Learning for Zero Shot Neural Machine Translation

no code implementations • 10 Mar 2021 • Surafel M. Lakew, Matteo Negri, Marco Turchi

Neural Machine Translation (NMT) approaches employing monolingual data are showing steady improvements in resource rich conditions.

Machine Translation NMT +2

Paper
Add Code

The Multilingual TEDx Corpus for Speech Recognition and Translation

no code implementations • 2 Feb 2021 • Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.

speech-recognition Speech Recognition +1

Paper
Add Code

CTC-based Compression for Direct Speech Translation

1 code implementation • EACL 2021 • Marco Gaido, Mauro Cettolo, Matteo Negri, Marco Turchi

Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST).

Translation

Paper
Code

Breeding Gender-aware Direct Speech Translation Systems

no code implementations • COLING 2020 • Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi

In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e. g. speaker's vocal characteristics) that is otherwise lost in the cascade framework.

Machine Translation Translation

Paper
Add Code

On Knowledge Distillation for Direct Speech Translation

1 code implementation • 9 Dec 2020 • Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi

Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

The Two Shades of Dubbing in Neural Machine Translation

no code implementations • COLING 2020 • Alina Karakanta, Supratik Bhattacharya, Shravan Nayak, Timo Baumann, Matteo Negri, Marco Turchi

Dubbing has two shades; synchronisation constraints are applied only when the actor{'}s mouth is visible on screen, while the translation is unconstrained for off-screen dubbing.

Machine Translation Translation +1

Paper
Add Code

On Target Segmentation for Direct Speech Translation

no code implementations • AMTA 2020 • Mattia Antonino Di Gangi, Marco Gaido, Matteo Negri, Marco Turchi

Then, subword-level segmentation became the state of the art in neural machine translation as it produces shorter sequences that reduce the training time, while being superior to word-level models.

Data Augmentation Machine Translation +2

Paper
Add Code

Contextualized Translation of Automatically Segmented Speech

1 code implementation • 5 Aug 2020 • Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi

We show that our context-aware solution is more robust to VAD-segmented input, outperforming a strong base model and the fine-tuning on different VAD segmentations of an English-German test set by up to 4. 25 BLEU points.

Segmentation Sentence +2

Paper
Code

FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN

no code implementations • WS 2020 • Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ond{\v{r}}ej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian St{\"u}ker, Marco Turchi, Alex Waibel, er, Changhan Wang

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation.

Translation

Paper
Add Code

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

no code implementations • ACL 2020 • Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia Antonino Di Gangi, Roldano Cattoni, Marco Turchi

Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines.

Machine Translation Sentence +1

Paper
Add Code

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

no code implementations • WS 2020 • Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi

The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation.

Data Augmentation Knowledge Distillation +3

Paper
Add Code

Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?

no code implementations • WS 2020 • Alina Karakanta, Matteo Negri, Marco Turchi

Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily.

Machine Translation NMT +1

Paper
Add Code

Low Resource Neural Machine Translation: A Benchmark for Five African Languages

1 code implementation • 31 Mar 2020 • Surafel M. Lakew, Matteo Negri, Marco Turchi

Recent advents in Neural Machine Translation (NMT) have shown improvements in low-resource language (LRL) translation tasks.

Low-Resource Neural Machine Translation NMT +2

Paper
Code

MuST-Cinema: a Speech-to-Subtitles corpus

no code implementations • LREC 2020 • Alina Karakanta, Matteo Negri, Marco Turchi

Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling.

Machine Translation NMT +1

Paper
Add Code

Adapting Multilingual Neural Machine Translation to Unseen Languages

1 code implementation • EMNLP (IWSLT) 2019 • Surafel M. Lakew, Alina Karakanta, Marcello Federico, Matteo Negri, Marco Turchi

In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance.

Data Augmentation Machine Translation +2

Paper
Code

Instance-Based Model Adaptation For Direct Speech Translation

no code implementations • 23 Oct 2019 • Mattia Antonino Di Gangi, Viet-Nhat Nguyen, Matteo Negri, Marco Turchi

Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora.

Domain Adaptation Speech-to-Text Translation +1

Paper
Add Code

One-To-Many Multilingual End-to-end Speech Translation

no code implementations • 8 Oct 2019 • Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi

Multilingual solutions are widely studied in MT and usually rely on ``\textit{target forcing}'', in which multilingual parallel data are combined to train a single model by prepending to the input sequences a language token that specifies the target language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Machine Translation for Machines: the Sentiment Classification Use Case

no code implementations • IJCNLP 2019 • Amirhossein Tebbifakhr, Luisa Bentivogli, Matteo Negri, Marco Turchi

Towards this objective, we present a reinforcement learning technique based on a new candidate sampling strategy, which exploits the results obtained on the downstream task as weak feedback.

Classification General Classification +7

Paper
Add Code

Multilingual Neural Machine Translation for Zero-Resource Languages

1 code implementation • 16 Sep 2019 • Surafel M. Lakew, Marcello Federico, Matteo Negri, Marco Turchi

In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT).

Machine Translation NMT +1

Paper
Code

Improving Translations by Combining Fuzzy-Match Repair with Automatic Post-Editing

no code implementations • WS 2019 • John Ortega, Felipe S{\'a}nchez-Mart{\'\i}nez, Marco Turchi, Matteo Negri

Automatic Post-Editing

Paper
Add Code

Enhancing Transformer for End-to-end Speech-to-Text Translation

no code implementations • WS 2019 • Mattia Antonino Di Gangi, Matteo Negri, Roldano Cattoni, Roberto Dessi, Marco Turchi

Speech-to-Text Translation Translation

Paper
Add Code

Findings of the WMT 2019 Shared Task on Automatic Post-Editing

no code implementations • WS 2019 • Rajen Chatterjee, Christian Federmann, Matteo Negri, Marco Turchi

Seven teams participated in the English-German task, with a total of 18 submitted runs.

Automatic Post-Editing Translation

Paper
Add Code

Effort-Aware Neural Automatic Post-Editing

no code implementations • WS 2019 • Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi

For this purpose, following the common approach in multilingual NMT, we prepend a special token to the beginning of both the source text and the MT output indicating the required amount of post-editing.

Automatic Post-Editing NMT +1

Paper
Add Code

MuST-C: a Multilingual Speech Translation Corpus

no code implementations • NAACL 2019 • Mattia A. Di Gangi, Roldano Cattoni, Luisa Bentivogli, Matteo Negri, Marco Turchi

Current research on spoken language translation (SLT) has to confront with the scarcity of sizeable and publicly available training corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Neural Text Simplification in Low-Resource Conditions Using Weak Supervision

no code implementations • WS 2019 • Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, Mattia A. Di Gangi

Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs.

Machine Translation Sentence +3

Paper
Add Code

Improving Zero-Shot Translation of Low-Resource Languages

1 code implementation • IWSLT 2017 • Surafel M. Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, Marcello Federico

Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zeroshot) translation directions not observed at training time.

Machine Translation Translation

Paper
Code

Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

2 code implementations • IWSLT (EMNLP) 2018 • Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, Marco Turchi

Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i. e., introducing new vocabulary items if they are not included in the initial model).

Machine Translation NMT +2

Paper
Code

Generating E-Commerce Product Titles and Predicting their Quality

no code implementations • WS 2018 • Jos{\'e} G. Camargo de Souza, Michael Kozielski, Prashant Mathur, Ernie Chang, Marco Guerini, Matteo Negri, Marco Turchi, Evgeny Matusov

The setting requires the generation process to be fast and the generated title to be both human-readable and concise.

Text Generation

Paper
Add Code

Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018

no code implementations • IWSLT (EMNLP) 2018 • Mattia Antonino Di Gangi, Roberto Dessì, Roldano Cattoni, Matteo Negri, Marco Turchi

This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018.

Machine Translation Translation

Paper
Add Code

Multi-source transformer with combined losses for automatic post editing

no code implementations • WS 2018 • Amirhossein Tebbifakhr, Ruchit Agrawal, Matteo Negri, Marco Turchi

In the first subtask, our system improves over the baseline up to -5. 3 TER and +8. 23 BLEU points ranking second out of 11 submitted runs.

Automatic Post-Editing NMT +2

Paper
Add Code

Findings of the WMT 2018 Shared Task on Automatic Post-Editing

no code implementations • WS 2018 • Rajen Chatterjee, Matteo Negri, Raphael Rubino, Marco Turchi

In the former subtask, characterized by original translations of lower quality, top results achieved impressive improvements, up to -6. 24 TER and +9. 53 BLEU points over the baseline {``}\textit{do-nothing}{''} system.

Automatic Post-Editing NMT +1

Paper
Add Code

Proceedings of the Third Conference on Machine Translation: Shared Task Papers

no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor

Machine Translation Translation

Paper
Add Code

eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

no code implementations • LREC 2018 • Matteo Negri, Marco Turchi, Rajen Chatterjee, Nicola Bertoldi

eSCAPE consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora, and using the target side as an artificial human post-edit.

Automatic Post-Editing Sentence

Paper
Add Code

Combining Quality Estimation and Automatic Post-editing to Enhance Machine Translation output

no code implementations • WS 2018 • Rajen Chatterjee, Matteo Negri, Marco Turchi, Fr{\'e}d{\'e}ric Blain, Lucia Specia

Automatic Post-Editing Translation

Paper
Add Code

Guiding Neural Machine Translation Decoding with External Knowledge

no code implementations • WS 2017 • Rajen Chatterjee, Matteo Negri, Marco Turchi, Marcello Federico, Lucia Specia, Fr{\'e}d{\'e}ric Blain

Machine Translation Translation

Paper
Add Code

Multi-Domain Neural Machine Translation through Unsupervised Adaptation

no code implementations • WS 2017 • M. Amin Farajian, Marco Turchi, Matteo Negri, Marcello Federico

Machine Translation Translation

Paper
Add Code

Findings of the 2017 Conference on Machine Translation (WMT17)

no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi

Automatic Post-Editing Multimodal Machine Translation +1

Paper
Add Code

Multi-source Neural Automatic Post-Editing: FBK's participation in the WMT 2017 APE shared task

no code implementations • WS 2017 • Rajen Chatterjee, M. Amin Farajian, Matteo Negri, Marco Turchi, Ankit Srivastava, Santanu Pal

Automatic Post-Editing Language Modelling

Paper
Add Code

Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

no code implementations • 31 Jul 2017 • Duygu Ataman, Matteo Negri, Marco Turchi, Marcello Federico

In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language.

Machine Translation Morphological Analysis +2

Paper
Add Code

Automatic Quality Estimation for ASR System Combination

no code implementations • 22 Jun 2017 • Shahab Jalalvand, Matteo Negri, Daniele Falavigna, Marco Matassoni, Marco Turchi

In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario

no code implementations • EACL 2017 • M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, Marcello Federico

State-of-the-art neural machine translation (NMT) systems are generally trained on specific domains by carefully selecting the training sets and applying proper domain adaptation techniques.

Domain Adaptation Machine Translation +2

Paper
Add Code

Online Automatic Post-editing for MT in a Multi-Domain Translation Environment

no code implementations • EACL 2017 • Rajen Chatterjee, Gebremedhen Gebremelak, Matteo Negri, Marco Turchi

Automatic post-editing (APE) for machine translation (MT) aims to fix recurrent errors made by the MT decoder by learning from correction examples.

Automatic Post-Editing Decoder +1

Paper
Add Code

DNN adaptation by automatic quality estimation of ASR hypotheses

no code implementations • 6 Feb 2017 • Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi

Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component.

Sentence

Paper
Add Code

Findings of the 2016 Conference on Machine Translation

no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri

Automatic Post-Editing Multimodal Machine Translation +1

Paper
Add Code

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi

Machine Translation Translation

Paper
Add Code

The FBK Participation in the WMT 2016 Automatic Post-editing Shared Task

no code implementations • WS 2016 • Rajen Chatterjee, Jos{\'e} G. C. de Souza, Matteo Negri, Marco Turchi

Automatic Post-Editing Data Augmentation

Paper
Add Code

An Unsupervised Method for Automatic Translation Memory Cleaning

no code implementations • ACL 2016 • Masoud Jalili Sabet, Matteo Negri, Marco Turchi, Eduard Barbu

Machine Translation Translation

Paper
Add Code

TranscRater: a Tool for Automatic Speech Recognition Quality Estimation

no code implementations • ACL 2016 • Shahab Jalalvand, Matteo Negri, Marco Turchi, JosÃ© G. C. de Souza, Falavigna Daniele, Mohammed R. H. Qwaider

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

TMop: a Tool for Unsupervised Translation Memory Cleaning

1 code implementation • ACL 2016 • Masoud Jalili Sabet, Matteo Negri, Marco Turchi, Jos{\'e} G. C. de Souza, Marcello Federico

Machine Translation Translation

Paper
Code

FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual Semantic Similarity Measurement Using Quality Estimation Features and Compositional Bilingual Word Embeddings

no code implementations • SEMEVAL 2016 • Duygu Ataman, Jos{\'e} G. C. de Souza, Marco Turchi, Matteo Negri

Cross-Lingual Semantic Textual Similarity Machine Translation +6

Paper
Add Code

SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis

no code implementations • 30 Oct 2015 • Lorenzo Gatti, Marco Guerini, Marco Turchi

Using this technique we have built SentiWords, a prior polarity lexicon of approximately 155, 000 words, that has both a high precision and a high coverage.

Sentiment Analysis Vocal Bursts Intensity Prediction

Paper
Add Code

The FBK Participation in the WMT15 Automatic Post-editing Shared Task

no code implementations • WS 2015 • Rajen Chatterjee, Marco Turchi, Matteo Negri

Automatic Post-Editing

Paper
Add Code

Findings of the 2015 Workshop on Statistical Machine Translation

no code implementations • WS 2015 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, Marco Turchi

Automatic Post-Editing Translation

Paper
Add Code

Online Multitask Learning for Machine Translation Quality Estimation

1 code implementation • IJCNLP 2015 • Jos{\'e} G. C. de Souza, Matteo Negri, Elisa Ricci, Marco Turchi

Machine Translation Translation

Paper
Code

Exploring the Planet of the APEs: a Comparative Study of State-of-the-art Methods for MT Automatic Post-Editing

no code implementations • IJCNLP 2015 • Rajen Chatterjee, Marion Weller, Matteo Negri, Marco Turchi

Automatic Post-Editing Domain Adaptation

Paper
Add Code

Knowledge Portability with Semantic Expansion of Ontology Labels

no code implementations • IJCNLP 2015 • Mihael Arcan, Marco Turchi, Paul Buitelaar

Information Retrieval Machine Translation +1

Paper
Add Code

Driving ROVER with Segment-based ASR Quality Estimation

no code implementations • ACL 2015 • Matteo Negri, Marco Turchi, Falavigna Daniele, Shahab Jalalvand

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

MT Quality Estimation for Computer-assisted Translation: Does it Really Help?

no code implementations • IJCNLP 2015 • Marco Turchi, Matteo Negri, Marcello Federico

Machine Translation Translation

Paper
Add Code

Multitask Learning for Adaptive Quality Estimation of Automatically Transcribed Utterances

no code implementations • HLT 2015 • Matteo Negri, José G. C. de Souza, Marco Turchi, Falavigna Daniele, Hamed Zamani

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Assessing the Impact of Translation Errors on Machine Translation Quality with Mixed-effects Models

no code implementations • EMNLP 2014 • Marcello Federico, Matteo Negri, Luisa Bentivogli, Marco Turchi

Machine Translation Translation

Paper
Add Code

The MateCat Tool

no code implementations • COLING 2014 • Marcello Federico, Nicola Bertoldi, Mauro Cettolo, Matteo Negri, Marco Turchi, Marco Trombetti, Aless Cattelan, ro, Antonio Farina, Domenico Lupinetti, Andrea Martines, Alberto Massidda, Holger Schwenk, Lo{\"\i}c Barrault, Frederic Blain, Philipp Koehn, Christian Buck, Ulrich Germann

Machine Translation

Paper
Add Code

Quality Estimation for Automatic Speech Recognition

no code implementations • COLING 2014 • Matteo Negri, Marco Turchi, Jos{\'e} G. C. de Souza, Daniele Falavigna

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Machine Translation Quality Estimation Across Domains

no code implementations • COLING 2014 • Jos{\'e} G. C. de Souza, Marco Turchi, Matteo Negri

Machine Translation Translation

Paper
Add Code

Identification of Bilingual Terms from Monolingual Documents for Statistical Machine Translation

no code implementations • WS 2014 • Mihael Arcan, Claudio Giuliano, Marco Turchi, Paul Buitelaar

Machine Translation Translation

Paper
Add Code

Adaptive Quality Estimation for Machine Translation

no code implementations • ACL 2014 • Marco Turchi, Antonios Anastasopoulos, Jos{\'e} G. C. de Souza, Matteo Negri

Machine Translation Translation

Paper
Add Code

FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task

no code implementations • WS 2014 • Jos{\'e} Guilherme Camargo de Souza, Jes{\'u}s Gonz{\'a}lez-Rubio, Christian Buck, Marco Turchi, Matteo Negri

Language Modelling Machine Translation

Paper
Add Code

Automatic Annotation of Machine Translation Datasets with Binary Quality Judgements

no code implementations • LREC 2014 • Marco Turchi, Matteo Negri

To overcome these issues, we present an automatic method for the annotation of (source, target) pairs with binary judgements that reflect an empirical, and easily interpretable notion of quality.

Machine Translation Re-Ranking +2

Paper
Add Code

Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts

no code implementations • LREC 2014 • Alex Balahur, ra, Marco Turchi, Ralf Steinberger, Jose-Manuel Perea-Ortega, Guillaume Jacquet, Dilek K{\"u}{\c{c}}{\"u}k, Vanni Zavarella, Adil El Ghali

We show that the use of machine translated data obtained similar results as the use of native-speaker translations of the same data.

Classification General Classification +6

Paper
Add Code

An efficient and user-friendly tool for machine translation quality estimation

no code implementations • LREC 2014 • Kashif Shah, Marco Turchi, Lucia Specia

We present a new version of QUEST ― an open source framework for machine translation quality estimation ― which brings a number of improvements: (i) it provides a Web interface and functionalities such that non-expert users, e. g. translators or lay-users of machine translations, can get quality predictions (or internal features of the framework) for translations without having to install the toolkit, obtain resources or build prediction models; (ii) it significantly improves over the previous runtime performance by keeping resources (such as language models) in memory; (iii) it provides an option for users to submit the source text only and automatically obtain translations from Bing Translator; (iv) it provides a ranking of multiple translations submitted by users for each source text according to their estimated quality.

Machine Translation Translation

Paper
Add Code

ONTS: "Optima" News Translation System

no code implementations • EACL 2012 • Marco Turchi, Martin Atkinson, Alastair Wilcox, Brett Crawley, Stefano Bucci, Ralf Steinberger, Erik van der Goot

We propose a real-time machine translation system that allows users to select a news category and to translate the related live news articles from Arabic, Czech, Danish, Farsi, French, German, Italian, Polish, Portuguese, Spanish and Turkish into English.

Machine Translation Translation

Paper
Add Code

Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

no code implementations • EMNLP 2013 • Marco Guerini, Lorenzo Gatti, Marco Turchi

Assigning a positive or negative score to a word out of context (i. e. a word's prior polarity) is a challenging task for sentiment analysis.

General Classification Sentiment Analysis

Paper
Add Code

JRC EuroVoc Indexer JEX - A freely available multi-label categorisation tool

no code implementations • LREC 2012 • Ralf Steinberger, Mohamed Ebrahim, Marco Turchi

EuroVoc (2012) is a highly multilingual thesaurus consisting of over 6, 700 hierarchically organised subject domains used by European Institutions and many authorities in Member States of the European Union (EU) for the classification and retrieval of official documents.

Classification Clustering +4