1 code implementation • 13 Mar 2025 • Kshitij Ambilduke, Ben Peters, Sonal Sannigrahi, Anil Keshwani, Tsz Kin Lam, Bruno Martins, Marcely Zanon Boito, André F. T. Martins
Large language models (LLMs) have shown remarkable performance and generalization capabilities across multiple languages and tasks, making them very attractive targets for multi-modality integration (e. g., images or speech).
no code implementations • 4 Jan 2025 • Tsz Kin Lam, Marco Gaido, Sara Papi, Luisa Bentivogli, Barry Haddow
Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is increasing interest in extending their capabilities to speech -- the most common form of communication.
no code implementations • 7 Nov 2024 • Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubiński, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha, John Ortega, Sara Papi, Peter Polák, Adam Pospíšil, Pavel Pecina, Elizabeth Salesky, Nivedita Sethiya, Balaram Sarkar, Jiatong Shi, Claytone Sikasote, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Brian Thompson, Marco Turchi, Alex Waibel, Shinji Watanabe, Patrick Wilken, Petr Zemánek, Rodolfo Zevallos
This paper reports on the shared tasks organized by the 21st IWSLT Conference.
1 code implementation • 27 Aug 2024 • Vilém Zouhar, Pinzhen Chen, Tsz Kin Lam, Nikita Moghe, Barry Haddow
The COMET metric has blazed a trail in the machine translation community, given its strong correlation with human judgements of translation quality.
no code implementations • 29 Feb 2024 • Tsz Kin Lam, Alexandra Birch, Barry Haddow
It also avoids speech discretization in inference and is more robust to the DSU tokenization.
no code implementations • 1 Feb 2024 • Giulio Zhou, Tsz Kin Lam, Alexandra Birch, Barry Haddow
While there has been a growing interest in developing direct speech translation systems to avoid propagating errors and losing non-verbal content, prior work in direct S2TT has struggled to conclusively establish the advantages of integrating the acoustic signal directly into the translation process.
no code implementations • 27 Oct 2022 • Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler
Data augmentation is a technique to generate new training data based on existing data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 24 Oct 2022 • Tsz Kin Lam, Eva Hasler, Felix Hieber
Customer feedback can be an important signal for improving commercial machine translation systems.
no code implementations • ACL 2022 • Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler
End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language.
1 code implementation • 3 Apr 2021 • Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler
Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 21 Oct 2020 • Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler
Direct speech translation describes a scenario where only speech inputs and corresponding translations are available.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • WS 2019 • Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler
We propose an interactive-predictive neural machine translation framework for easier model personalization using reinforcement and imitation learning.
1 code implementation • 3 May 2018 • Tsz Kin Lam, Julia Kreutzer, Stefan Riezler
We present an approach to interactive-predictive neural machine translation that attempts to reduce human effort from three directions: Firstly, instead of requiring humans to select, correct, or delete segments, we employ the idea of learning from human reinforcements in form of judgments on the quality of partial translations.