no code implementations • VarDial (COLING) 2020 • Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi
In this work, we tackle the problem in a multilingual setting where a single NMT model translates from multiple languages for downstream automatic processing in the target language.
no code implementations • AMTA 2016 • Rajen Chatterjee, Mihael Arcan, Matteo Negri, Marco Turchi
In recent years, several end-to-end online translation systems have been proposed to successfully incorporate human post-editing feedback in the translation workflow.
no code implementations • NAACL (GeBNLP) 2022 • Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi
In this work, we contribute to such a line of inquiry by exploring the emergence of gender bias in Speech Translation (ST).
no code implementations • IWSLT (ACL) 2022 • Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.
no code implementations • EAMT 2020 • Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi
We address this problem by proposing a multi-task approach to machine-oriented NMT adaptation, which is capable to serve multiple downstream tasks with a single system.
no code implementations • EMNLP 2021 • Marco Gaido, Susana Rodríguez, Matteo Negri, Luisa Bentivogli, Marco Turchi
Automatic translation systems are known to struggle with rare words.
no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.
no code implementations • WMT (EMNLP) 2020 • Rajen Chatterjee, Markus Freitag, Matteo Negri, Marco Turchi
Due to i) the different source/domain of data compared to the past (Wikipedia vs Information Technology), ii) the different quality of the initial translations to be corrected and iii) the introduction of a new language pair (English-Chinese), this year’s results are not directly comparable with last year’s round.
no code implementations • MTSummit 2021 • Surafel M. Lakew, Matteo Negri, Marco Turchi
Neural Machine Translation (NMT) approaches employing monolingual data are showing steady improvements in resource-rich conditions.
no code implementations • IWSLT 2017 • Surafel M. Lakew, Quintino F. Lotito, Marco Turchi, Matteo Negri, Marcello Federico
Particularly, we focus on the four zero-shot directions and show how a multilingual model trained with small data can provide reasonable results.
no code implementations • IWSLT 2016 • M. Amin Farajian, Rajen Chatterjee, Costanza Conforti, Shahab Jalalvand, Vevake Balaraman, Mattia A. Di Gangi, Duygu Ataman, Marco Turchi, Matteo Negri, Marcello Federico
They leverage linguistic information such as lemmas and part-of-speech tags of the source words in the form of additional factors along with the words.
no code implementations • EMNLP (IWSLT) 2019 • Mattia A. Di Gangi, Matteo Negri, Viet Nhat Nguyen, Amirhossein Tebbifakhr, Marco Turchi
On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • EMNLP (IWSLT) 2019 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.
no code implementations • ACL (IWSLT) 2021 • Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.
1 code implementation • EAMT 2022 • Alina Karakanta, Luisa Bentivogli, Mauro Cettolo, Matteo Negri, Marco Turchi
Subtitling tools are recently being adapted for post-editing by providing automatically generated subtitles, and featuring not only machine translation, but also automatic segmentation and synchronisation.
no code implementations • EAMT 2022 • Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Matteo Negri, Marco Turchi
This project aimed at extending the test sets of the MuST-C speech translation (ST) corpus with new reference translations.
no code implementations • EAMT 2022 • Alina Karakanta, Luisa Bentivogli, Mauro Cettolo, Matteo Negri, Marco Turchi
In response to the increasing interest towards automatic subtitling, this EAMT-funded project aimed at collecting subtitle post-editing data in a real use case scenario where professional subtitlers edit automatically generated subtitles.
no code implementations • 16 Jan 2025 • Beatrice Savoldi, Eleonora Cupin, Manjinder Thind, Anne Lauscher, Andrea Piergentili, Matteo Negri, Luisa Bentivogli
Gender-neutral language reflects societal and linguistic shifts towards greater inclusivity by avoiding the implication that one gender is the norm over others.
no code implementations • 26 Dec 2024 • Simona Frenda, Andrea Piergentili, Beatrice Savoldi, Marco Madeddu, Martina Rosola, Silvia Casola, Chiara Ferrando, Viviana Patti, Matteo Negri, Luisa Bentivogli
The challenge, designed to assess and monitor the recognition and generation of gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of gender-fair language in automatic translation from English to Italian.
no code implementations • 16 Dec 2024 • Beomseok Lee, Marco Gaido, Ioan Calapodescu, Laurent Besacier, Matteo Negri
While crowdsourcing is an established solution for facilitating and scaling the collection of speech data, the involvement of non-experts necessitates protocols to ensure final data quality.
no code implementations • 7 Nov 2024 • Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubiński, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha, John Ortega, Sara Papi, Peter Polák, Adam Pospíšil, Pavel Pecina, Elizabeth Salesky, Nivedita Sethiya, Balaram Sarkar, Jiatong Shi, Claytone Sikasote, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Brian Thompson, Marco Turchi, Alex Waibel, Shinji Watanabe, Patrick Wilken, Petr Zemánek, Rodolfo Zevallos
This paper reports on the shared tasks organized by the 21st IWSLT Conference.
no code implementations • 3 Nov 2024 • Dennis Fucci, Marco Gaido, Beatrice Savoldi, Matteo Negri, Mauro Cettolo, Luisa Bentivogli
Spurred by the demand for interpretable models, research on eXplainable AI for language technologies has experienced significant growth, with feature attribution methods emerging as a cornerstone of this progress.
1 code implementation • 1 Oct 2024 • Beatrice Savoldi, Sara Papi, Matteo Negri, Ana Guerberof, Luisa Bentivogli
Gender bias in machine translation (MT) is recognized as an issue that can harm people and society.
1 code implementation • 1 Oct 2024 • Marco Gaido, Sara Papi, Luisa Bentivogli, Alessio Brutti, Mauro Cettolo, Roberto Gretter, Marco Matassoni, Mohamed Nabih, Matteo Negri
The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models.
1 code implementation • 24 Sep 2024 • Francesco D'Amico, Matteo Negri
Transformers are one of the most successful architectures of modern neural networks.
1 code implementation • 7 Aug 2024 • Beomseok Lee, Ioan Calapodescu, Marco Gaido, Matteo Negri, Laurent Besacier
We present Speech-MASSIVE, a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSIVE textual corpus.
no code implementations • 8 Jul 2024 • Silvio Kalaj, Clarissa Lauditi, Gabriele Perugini, Carlo Lucibello, Enrico M. Malatesta, Matteo Negri
It has been recently shown that a learning transition happens when a Hopfield Network stores examples generated as superpositions of random features, where new attractors corresponding to such features appear in the model.
1 code implementation • 20 Jun 2024 • Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli
This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024.
1 code implementation • 10 Jun 2024 • Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli
To fill this gap, we introduce StreamAtt, the first StreamST policy, and propose StreamLAAL, the first StreamST latency metric designed to be comparable with existing metrics for SimulST.
no code implementations • 6 Jun 2024 • Matthias Sperber, Ondřej Bojar, Barry Haddow, Dávid Javorský, Xutai Ma, Matteo Negri, Jan Niehues, Peter Polák, Elizabeth Salesky, Katsuhito Sudoh, Marco Turchi
Human evaluation is a critical component in machine translation system development and has received much attention in text translation research.
2 code implementations • 17 May 2024 • Marco Gaido, Sara Papi, Matteo Negri, Mauro Cettolo, Luisa Bentivogli
Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration.
no code implementations • 14 May 2024 • Andrea Piergentili, Beatrice Savoldi, Matteo Negri, Luisa Bentivogli
In this direction, we explore prompting techniques with large language models (LLMs) to translate from English into Italian using neomorphemes.
1 code implementation • 20 Feb 2024 • Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli
The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity.
no code implementations • 19 Feb 2024 • Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli
The field of natural language processing (NLP) has recently witnessed a transformative shift with the emergence of foundation models, particularly Large Language Models (LLMs) that have revolutionized text-based NLP.
1 code implementation • 8 Feb 2024 • Beatrice Savoldi, Andrea Piergentili, Dennis Fucci, Matteo Negri, Luisa Bentivogli
Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a pivotal challenge for the creation of more inclusive translation technologies.
1 code implementation • 30 Oct 2023 • Beatrice Savoldi, Marco Gaido, Matteo Negri, Luisa Bentivogli
As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES.
1 code implementation • 24 Oct 2023 • Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli
When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits.
1 code implementation • 23 Oct 2023 • Marco Gaido, Dennis Fucci, Matteo Negri, Luisa Bentivogli
When translating from notional gender languages (e. g., English) into grammatical gender languages (e. g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker.
1 code implementation • 10 Oct 2023 • Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli
Automatic speech recognition (ASR) systems are known to be sensitive to the sociolinguistic variability of speech data, in which gender plays a crucial role.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 8 Oct 2023 • Andrea Piergentili, Beatrice Savoldi, Dennis Fucci, Matteo Negri, Luisa Bentivogli
Gender inequality is embedded in our communication practices and perpetuated in translation technologies.
1 code implementation • 27 Sep 2023 • Sara Papi, Marco Gaido, Matteo Negri
This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign.
2 code implementations • 19 May 2023 • Sara Papi, Marco Turchi, Matteo Negri
Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks.
no code implementations • 29 Mar 2023 • Matteo Negri, Clarissa Lauditi, Gabriele Perugini, Carlo Lucibello, Enrico Malatesta
The Hopfield model is a paradigmatic model of neural networks that has been analyzed for many decades in the statistical physics, neuroscience, and machine learning communities.
2 code implementations • 28 Mar 2023 • Sara Papi, Marco Gaido, Andrea Pilzer, Matteo Negri
Despite its crucial role in research experiments, code correctness is often presumed only on the basis of the perceived quality of results.
no code implementations • 24 Jan 2023 • Andrea Piergentili, Dennis Fucci, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri
Gender inclusivity in language technologies has become a prominent research topic.
2 code implementations • 15 Dec 2022 • Sara Papi, Matteo Negri, Marco Turchi
The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation.
1 code implementation • 21 Oct 2022 • Marco Gaido, Sara Papi, Matteo Negri, Marco Turchi
Modern automatic translation systems aim at place the human at the center by providing contextual support and knowledge.
no code implementations • 10 Oct 2022 • Daniele Ancora, Matteo Negri, Antonio Gianfrate, Dimitris Trypogeorgos, Lorenzo Dominici, Daniele Sanvitto, Federico Ricci-Tersenghi, Luca Leuzzi
In the field of disordered photonics, a common objective is to characterize optically opaque materials for controlling light delivery or performing imaging.
1 code implementation • 27 Sep 2022 • Sara Papi, Marco Gaido, Alina Karakanta, Mauro Cettolo, Matteo Negri, Marco Turchi
Automatic subtitling is the task of automatically translating the speech of audiovisual content into short pieces of timed text, i. e. subtitles and their corresponding timestamps.
1 code implementation • 21 Sep 2022 • Sara Papi, Alina Karakanta, Matteo Negri, Marco Turchi
Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines.
1 code implementation • NAACL (AutoSimTrans) 2022 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi
Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest possible latency, which is normally computed in terms of Average Lagging (AL).
1 code implementation • IWSLT (ACL) 2022 • Marco Gaido, Matteo Negri, Marco Turchi
Recent work has shown that systems for speech translation (ST) -- similarly to automatic speech recognition (ASR) -- poorly handle person names.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • IWSLT (ACL) 2022 • Marco Gaido, Sara Papi, Dennis Fucci, Giuseppe Fiameni, Matteo Negri, Marco Turchi
The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality.
1 code implementation • 8 Apr 2022 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi
In simultaneous speech translation (SimulST), finding the best trade-off between high translation quality and low latency is a challenging task.
1 code implementation • ACL 2022 • Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi
Gender bias is largely recognized as a problematic phenomenon affecting language technologies, with recent studies underscoring that it might surface differently across languages.
1 code implementation • 25 Nov 2021 • Matteo Negri, Guido Tiana, Riccardo Zecchina
The differing ability of polypeptide conformations to act as the native state of proteins has long been rationalized in terms of differing kinetic accessibility or thermodynamic stability.
no code implementations • 31 Oct 2021 • Sara Papi, Matteo Negri, Marco Turchi
Simultaneous speech translation (SimulST) is the task in which output generation has to be performed on partial, incremental speech input.
1 code implementation • 15 Sep 2021 • Marco Gaido, Susana Rodríguez, Matteo Negri, Luisa Bentivogli, Marco Turchi
Automatic translation systems are known to struggle with rare words.
1 code implementation • EMNLP 2021 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation.
Ranked #1 on
Speech-to-Text Translation
on MuST-C EN->NL
1 code implementation • MTSummit 2021 • Alina Karakanta, Sara Papi, Matteo Negri, Marco Turchi
Experiments on three language pairs (en$\rightarrow$it, de, fr) show that scrolling lines is the only mode achieving an acceptable reading speed while keeping delay close to a 4-second threshold.
1 code implementation • ACL (IWSLT) 2021 • Alina Karakanta, Marco Gaido, Matteo Negri, Marco Turchi
Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i. e. captions).
no code implementations • ACL (IWSLT) 2021 • Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi
Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trained on the available corpora.
no code implementations • ACL 2021 • Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Alberto Martinelli, Matteo Negri, Marco Turchi
Five years after the first published proofs of concept, direct approaches to speech translation (ST) are now competing with traditional cascade solutions.
1 code implementation • Findings (ACL) 2021 • Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi
In light of this finding, we propose a combined approach that preserves BPE overall translation quality, while leveraging the higher ability of character-based segmentation to properly translate gender.
no code implementations • ICNLSP 2021 • Marco Gaido, Matteo Negri, Mauro Cettolo, Marco Turchi
The audio segmentation mismatch between training data and those seen at run-time is a major problem in direct speech translation.
1 code implementation • 13 Apr 2021 • Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi
Machine translation (MT) technology has facilitated our daily tasks by providing accessible shortcuts for gathering, elaborating and communicating information.
no code implementations • EACL 2021 • Jan Niehues, Elizabeth Salesky, Marco Turchi, Matteo Negri
Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 10 Mar 2021 • Surafel M. Lakew, Matteo Negri, Marco Turchi
Neural Machine Translation (NMT) approaches employing monolingual data are showing steady improvements in resource rich conditions.
1 code implementation • EACL 2021 • Marco Gaido, Mauro Cettolo, Matteo Negri, Marco Turchi
Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST).
no code implementations • 2 Feb 2021 • Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.
1 code implementation • 9 Dec 2020 • Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi
Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • COLING 2020 • Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi
In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e. g. speaker's vocal characteristics) that is otherwise lost in the cascade framework.
no code implementations • COLING 2020 • Alina Karakanta, Supratik Bhattacharya, Shravan Nayak, Timo Baumann, Matteo Negri, Marco Turchi
Dubbing has two shades; synchronisation constraints are applied only when the actor{'}s mouth is visible on screen, while the translation is unconstrained for off-screen dubbing.
no code implementations • 27 Oct 2020 • Carlo Baldassi, Enrico M. Malatesta, Matteo Negri, Riccardo Zecchina
We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function.
no code implementations • AMTA 2020 • Mattia Antonino Di Gangi, Marco Gaido, Matteo Negri, Marco Turchi
Then, subword-level segmentation became the state of the art in neural machine translation as it produces shorter sequences that reduce the training time, while being superior to word-level models.
1 code implementation • 5 Aug 2020 • Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi
We show that our context-aware solution is more robust to VAD-segmented input, outperforming a strong base model and the fine-tuning on different VAD segmentations of an English-German test set by up to 4. 25 BLEU points.
no code implementations • WS 2020 • Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ond{\v{r}}ej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian St{\"u}ker, Marco Turchi, Alex Waibel, er, Changhan Wang
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation.
1 code implementation • ACL 2020 • Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia Antonino Di Gangi, Roldano Cattoni, Marco Turchi
Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines.
no code implementations • WS 2020 • Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi
The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation.
no code implementations • WS 2020 • Alina Karakanta, Matteo Negri, Marco Turchi
Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily.
1 code implementation • 31 Mar 2020 • Surafel M. Lakew, Matteo Negri, Marco Turchi
Recent advents in Neural Machine Translation (NMT) have shown improvements in low-resource language (LRL) translation tasks.
Low Resource Neural Machine Translation
Low-Resource Neural Machine Translation
+3
no code implementations • LREC 2020 • Alina Karakanta, Matteo Negri, Marco Turchi
Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling.
1 code implementation • EMNLP (IWSLT) 2019 • Surafel M. Lakew, Alina Karakanta, Marcello Federico, Matteo Negri, Marco Turchi
In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance.
no code implementations • 23 Oct 2019 • Mattia Antonino Di Gangi, Viet-Nhat Nguyen, Matteo Negri, Marco Turchi
Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora.
no code implementations • 8 Oct 2019 • Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi
Multilingual solutions are widely studied in MT and usually rely on ``\textit{target forcing}'', in which multilingual parallel data are combined to train a single model by prepending to the input sequences a language token that specifies the target language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • IJCNLP 2019 • Amirhossein Tebbifakhr, Luisa Bentivogli, Matteo Negri, Marco Turchi
Towards this objective, we present a reinforcement learning technique based on a new candidate sampling strategy, which exploits the results obtained on the downstream task as weak feedback.
no code implementations • 29 Sep 2019 • Matteo Negri, Davide Bergamini, Carlo Baldassi, Riccardo Zecchina, Christoph Feinauer
Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features.
1 code implementation • 16 Sep 2019 • Surafel M. Lakew, Marcello Federico, Matteo Negri, Marco Turchi
In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT).
no code implementations • WS 2019 • Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi
For this purpose, following the common approach in multilingual NMT, we prepend a special token to the beginning of both the source text and the MT output indicating the required amount of post-editing.
no code implementations • WS 2019 • Rajen Chatterjee, Christian Federmann, Matteo Negri, Marco Turchi
Seven teams participated in the English-German task, with a total of 18 submitted runs.
no code implementations • WS 2019 • Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, Mattia A. Di Gangi
Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs.
no code implementations • NAACL 2019 • Mattia A. Di Gangi, Roldano Cattoni, Luisa Bentivogli, Matteo Negri, Marco Turchi
Current research on spoken language translation (SLT) has to confront with the scarcity of sizeable and publicly available training corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • IWSLT 2017 • Surafel M. Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, Marcello Federico
Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zeroshot) translation directions not observed at training time.
2 code implementations • IWSLT (EMNLP) 2018 • Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, Marco Turchi
Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i. e., introducing new vocabulary items if they are not included in the initial model).
no code implementations • WS 2018 • Jos{\'e} G. Camargo de Souza, Michael Kozielski, Prashant Mathur, Ernie Chang, Marco Guerini, Matteo Negri, Marco Turchi, Evgeny Matusov
The setting requires the generation process to be fast and the generated title to be both human-readable and concise.
no code implementations • IWSLT (EMNLP) 2018 • Mattia Antonino Di Gangi, Roberto Dessì, Roldano Cattoni, Matteo Negri, Marco Turchi
This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018.
no code implementations • WS 2018 • Amirhossein Tebbifakhr, Ruchit Agrawal, Matteo Negri, Marco Turchi
In the first subtask, our system improves over the baseline up to -5. 3 TER and +8. 23 BLEU points ranking second out of 11 submitted runs.
no code implementations • WS 2018 • Rajen Chatterjee, Matteo Negri, Raphael Rubino, Marco Turchi
In the former subtask, characterized by original translations of lower quality, top results achieved impressive improvements, up to -6. 24 TER and +9. 53 BLEU points over the baseline {``}\textit{do-nothing}{''} system.
no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
no code implementations • LREC 2018 • Matteo Negri, Marco Turchi, Rajen Chatterjee, Nicola Bertoldi
eSCAPE consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora, and using the target side as an artificial human post-edit.
no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi
no code implementations • 31 Jul 2017 • Duygu Ataman, Matteo Negri, Marco Turchi, Marcello Federico
In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language.
no code implementations • 22 Jun 2017 • Shahab Jalalvand, Matteo Negri, Daniele Falavigna, Marco Matassoni, Marco Turchi
In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • EACL 2017 • Rajen Chatterjee, Gebremedhen Gebremelak, Matteo Negri, Marco Turchi
Automatic post-editing (APE) for machine translation (MT) aims to fix recurrent errors made by the MT decoder by learning from correction examples.
no code implementations • EACL 2017 • M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, Marcello Federico
State-of-the-art neural machine translation (NMT) systems are generally trained on specific domains by carefully selecting the training sets and applying proper domain adaptation techniques.
no code implementations • 6 Feb 2017 • Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi
Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi
no code implementations • SEMEVAL 2016 • Duygu Ataman, Jos{\'e} G. C. de Souza, Marco Turchi, Matteo Negri
Cross-Lingual Semantic Textual Similarity
Machine Translation
+6
no code implementations • WS 2015 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, Marco Turchi
no code implementations • COLING 2014 • Marcello Federico, Nicola Bertoldi, Mauro Cettolo, Matteo Negri, Marco Turchi, Marco Trombetti, Aless Cattelan, ro, Antonio Farina, Domenico Lupinetti, Andrea Martines, Alberto Massidda, Holger Schwenk, Lo{\"\i}c Barrault, Frederic Blain, Philipp Koehn, Christian Buck, Ulrich Germann
no code implementations • LREC 2014 • Marco Turchi, Matteo Negri
To overcome these issues, we present an automatic method for the annotation of (source, target) pairs with binary judgements that reflect an empirical, and easily interpretable notion of quality.
no code implementations • LREC 2012 • Matteo Negri, Yashar Mehdad, Aless Marchetti, ro, Danilo Giampiccolo, Luisa Bentivogli
We present a framework for the acquisition of sentential paraphrases based on crowdsourcing.