no code implementations • 17 Oct 2023 • Peter Polák
Simultaneous speech translation (SST) aims to provide real-time translation of spoken language, even before the speaker finishes their sentence.
no code implementations • 20 Sep 2023 • Peter Polák, Ondřej Bojar
On a diverse set of language pairs and in- and out-of-domain data, we show that the proposed approach achieves state-of-the-art quality at no additional computational cost.
no code implementations • 20 Sep 2023 • Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar
Further, this method lacks mechanisms for \textit{controlling} the quality vs. latency tradeoff.
no code implementations • 26 May 2023 • Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre
Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling.
1 code implementation • 10 Apr 2023 • Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community.
no code implementations • LREC 2022 • Peter Polák, Muskaan Singh, Anna Nedoluzhko, Ondřej Bojar
To facilitate the research in this area, we present ALIGNMEET, a comprehensive tool for meeting annotation, alignment, and evaluation.
no code implementations • IWSLT (ACL) 2022 • Peter Polák, Ngoc-Quan Ngoc, Tuan-Nam Nguyen, Danni Liu, Carlos Mullov, Jan Niehues, Ondřej Bojar, Alexander Waibel
In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022.
no code implementations • 2 Sep 2021 • Peter Polák, Ondřej Bojar
End-to-end neural automatic speech recognition systems achieved recently state-of-the-art results, but they require large datasets and extensive computing resources.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3