no code implementations • IWSLT (EMNLP) 2018 • Matthias Sperber, Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Thanh-Le Ha, Sebastian Stüker, Alex Waibel
The baseline system is a cascade of an ASR system, a system to segment the ASR output and a neural machine translation system.
no code implementations • IWSLT 2016 • Yang Zhang, Jan Niehues, Alexander Waibel
Neural models have recently shown big improvements in the performance of phrase-based machine translation.
no code implementations • EMNLP (IWSLT) 2019 • Ngoc-Quan Pham, Thai-Son Nguyen, Thanh-Le Ha, Juan Hussain, Felix Schneider, Jan Niehues, Sebastian Stüker, Alexander Waibel
This paper describes KIT’s submission to the IWSLT 2019 Speech Translation task on two sub-tasks corresponding to two different datasets.
no code implementations • EMNLP (IWSLT) 2019 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.
no code implementations • WMT (EMNLP) 2021 • Danni Liu, Jan Niehues
We present our development of the multilingual machine translation system for the large-scale multilingual machine translation task at WMT 2021.
no code implementations • IWSLT (ACL) 2022 • Ngoc-Quan Pham, Tuan Nam Nguyen, Thai-Binh Nguyen, Danni Liu, Carlos Mullov, Jan Niehues, Alexander Waibel
Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches.
no code implementations • IWSLT 2016 • Eunah Cho, Jan Niehues, Thanh-Le Ha, Matthias Sperber, Mohammed Mediani, Alex Waibel
In addition, we investigated methods to combine NMT systems that encode the input as well as the output differently.
no code implementations • IWSLT 2016 • Eunah Cho, Jan Niehues, Thanh-Le Ha, Alex Waibel
In this paper, we investigate a multilingual approach for speech disfluency removal.
no code implementations • IWSLT 2016 • Mauro Cettolo, Jan Niehues, Sebastian Stüker, Luisa Bentivogli, Rolando Cattoni, Marcello Federico
The IWSLT 2016 Evaluation Campaign featured two tasks: the translation of talks and the translation of video conference conversations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • IWSLT (EMNLP) 2018 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Mauro Cettolo, Marco Turchi, Marcello Federico
The International Workshop of Spoken Language Translation (IWSLT) 2018 Evaluation Campaign featured two tasks: low-resource machine translation and speech translation.
no code implementations • IWSLT 2017 • Eunah Cho, Jan Niehues, Alex Waibel
Experiments show that generalizing rare and unknown words greatly improves the punctuation insertion performance, reaching up to 8. 8 points of improvement in F-score when applied to the out-of-domain test scenario.
no code implementations • IWSLT 2017 • Ngoc-Quan Pham, Matthias Sperber, Elizabeth Salesky, Thanh-Le Ha, Jan Niehues, Alexander Waibel
For the SLT track, in addition to a monolingual neural translation system used to generate correct punctuations and true cases of the data prior to training our multilingual system, we introduced a noise model in order to make our system more robust.
no code implementations • IWSLT 2017 • Mauro Cettolo, Marcello Federico, Luisa Bentivogli, Jan Niehues, Sebastian Stüker, Katsuhito Sudoh, Koichiro Yoshino, Christian Federmann
The IWSLT 2017 evaluation campaign has organised three tasks.
no code implementations • ACL (IWSLT) 2021 • Danni Liu, Jan Niehues
The task in this track is to build multilingual speech translation systems in supervised and zero-shot directions.
no code implementations • ACL (IWSLT) 2021 • Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.
no code implementations • IWSLT (ACL) 2022 • Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.
no code implementations • IWSLT 2017 • Matthias Sperber, Jan Niehues, Alex Waibel
We note that unlike our baseline model, models trained on noisy data are able to generate outputs of proper length even for noisy inputs, while gradually reducing output length for higher amount of noise, as might also be expected from a human translator.
no code implementations • 22 Sep 2023 • Renhan Lou, Jan Niehues
In this work, we propose a semi-automatic technique to extract these explanations from a large parallel corpus.
1 code implementation • 15 Sep 2023 • Danni Liu, Jan Niehues
Given recent progress in pretrained massively multilingual translation models, we use them as a foundation to transfer the attribute controlling capabilities to languages without supervised data.
no code implementations • 7 Aug 2023 • Christian Huber, Tu Anh Dinh, Carlos Mullov, Ngoc Quan Pham, Thai Binh Nguyen, Fabian Retkowski, Stefan Constantin, Enes Yavuz Ugan, Danni Liu, Zhaolin Li, Sai Koneru, Jan Niehues, Alexander Waibel
Secondly, we compare different approaches to low-latency speech translation using this framework.
1 code implementation • 8 Jun 2023 • Danni Liu, Thai Binh Nguyen, Sai Koneru, Enes Yavuz Ugan, Ngoc-Quan Pham, Tuan-Nam Nguyen, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues
In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks.
no code implementations • 26 May 2023 • Lena Cabrera, Jan Niehues
Neural machine translation (NMT) models often suffer from gender biases that harm users and society at large.
1 code implementation • 12 May 2023 • Tu Anh Dinh, Jan Niehues
Quality Estimation (QE) is the task of predicting the quality of Machine Translation (MT) system output, without using any gold-standard translation references.
no code implementations • 5 May 2023 • Zhong Zhou, Jan Niehues, Alex Waibel
We examine two approaches: 1. best selection of seed sentences to jump start translations in a new language in view of best generalization to the remainder of a larger targeted text(s), and 2. we adapt large general multilingual translation engines from many other languages to focus on a specific text in a new, unknown language.
no code implementations • 21 Nov 2022 • Ngoc-Quan Pham, Jan Niehues, Alexander Waibel
Multilingual speech recognition with neural networks is often implemented with batch-learning, when all of the languages are available before training.
no code implementations • 9 Nov 2022 • Zhaolin Li, Jan Niehues
When building state-of-the-art speech translation models, the need for large computational resources is a significant obstacle due to the large training data size and complex models.
1 code implementation • 2 Nov 2022 • Danni Liu, Jan Niehues
In this work, we discretize the encoder output latent space of multilingual models by assigning encoder states to entries in a codebook, which in effect represents source sentences in a new artificial language.
no code implementations • 24 May 2022 • Ngoc-Quan Pham, Alex Waibel, Jan Niehues
Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research.
1 code implementation • LREC 2022 • Pedro Jeuris, Jan Niehues
In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier.
no code implementations • IWSLT (ACL) 2022 • Peter Polák, Ngoc-Quan Ngoc, Tuan-Nam Nguyen, Danni Liu, Carlos Mullov, Jan Niehues, Ondřej Bojar, Alexander Waibel
In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022.
no code implementations • 28 Mar 2022 • Shashank Subramanya, Jan Niehues
Based on a technique to adapt end-to-end monolingual models, we investigate multilingual models and different architectures (end-to-end and cascade) on the ability to perform online speech translation.
1 code implementation • 26 Jan 2022 • Tu Anh Dinh, Danni Liu, Jan Niehues
We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data.
no code implementations • 14 Jan 2022 • Sai Koneru, Danni Liu, Jan Niehues
Although AL is shown to be helpful with large budgets, it is not enough to build high-quality translation systems in these low-resource conditions.
no code implementations • EACL 2021 • Jan Niehues, Elizabeth Salesky, Marco Turchi, Matteo Negri
Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • EACL (DravidianLangTech) 2021 • Sai Koneru, Danni Liu, Jan Niehues
We show that unifying the writing systems is essential in unsupervised translation between the Dravidian languages.
no code implementations • EACL 2021 • Jan Niehues
For humans, as well as for machine translation, bilingual dictionaries are a promising knowledge source to continuously integrate new knowledge.
1 code implementation • ACL 2021 • Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, Xian Li
The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training.
no code implementations • WS 2020 • Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ond{\v{r}}ej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian St{\"u}ker, Marco Turchi, Alex Waibel, er, Changhan Wang
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation.
1 code implementation • WS 2020 • Danni Liu, Jan Niehues, Gerasimos Spanakis
The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 22 May 2020 • Danni Liu, Gerasimos Spanakis, Jan Niehues
On How2 English-Portuguese speech translation, we reduce latency to 0. 7 second (-84% rel.)
Sequence-To-Sequence Speech Recognition
speech-recognition
+1
no code implementations • 20 May 2020 • Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stueker, Jan Niehues, Alexander Waibel
We also show that this model is able to better utilize synthetic data than the Transformer, and adapts better to variable sentence segmentation quality for speech translation.
no code implementations • AMTA 2020 • Jan Niehues
In this paper, we explore one of these, the generation of constraint translation.
no code implementations • 22 Mar 2020 • Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel
User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 29 Oct 2019 • Thai-Son Nguyen, Sebastian Stueker, Jan Niehues, Alex Waibel
Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • WS 2019 • Jan Niehues, Ngoc-Quan Pham
We show improvements on segment-level confidence estimation as well as on confidence estimation for source tokens.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • WS 2019 • Stefan Constantin, Jan Niehues, Alex Waibel
The state-of-the-art neural network architectures make it possible to create spoken language understanding systems with high quality and fast processing time.
Natural Language Understanding
Spoken Language Understanding
no code implementations • WS 2019 • Ngoc-Quan Pham, Jan Niehues, Thanh-Le Ha, Alex Waibel
We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset.
no code implementations • 30 Apr 2019 • Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Sebastian Stüker, Alexander Waibel
Recently, end-to-end sequence-to-sequence models for speech recognition have gained significant interest in the research community.
no code implementations • TACL 2019 • Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel
Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts.
no code implementations • 17 Dec 2018 • Stefan Constantin, Jan Niehues, Alex Waibel
When building a neural network-based Natural Language Understanding component, one main challenge is to collect enough training data.
no code implementations • 7 Nov 2018 • Elizabeth Salesky, Susanne Burger, Jan Niehues, Alex Waibel
We introduce a corpus of cleaned target data for the Fisher Spanish-English dataset for this task.
no code implementations • 19 Oct 2018 • Elizabeth Salesky, Andrew Runge, Alex Coda, Jan Niehues, Graham Neubig
However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search.
no code implementations • WS 2018 • Ngoc-Quan Pham, Jan Niehues, Alex Waibel, er
We present our experiments in the scope of the news translation task in WMT 2018, in directions: English→German.
no code implementations • WS 2018 • Ngoc-Quan Pham, Jan Niehues, Alex Waibel
Neural machine translation (NMT) has significantly improved the quality of automatic translation models.
no code implementations • COLING 2018 • Florian Dessloch, Thanh-Le Ha, Markus M{\"u}ller, Jan Niehues, Thai-Son Nguyen, Ngoc-Quan Pham, Elizabeth Salesky, Matthias Sperber, Sebastian St{\"u}ker, Thomas Zenkel, Alex Waibel, er
{\%} Combining these techniques, we are able to provide an adapted speech translation system for several European languages.
no code implementations • 1 Aug 2018 • Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber, Alex Waibel
After adaptation, we are able to reduce the number of corrections displayed during incremental output construction by 45%, without a decrease in translation quality.
no code implementations • 27 Jul 2018 • Patrick Huber, Jan Niehues, Alex Waibel
Our approach overcomes recent limitations with extended narratives through a multi-layered computational approach to generate an abstract context representation.
1 code implementation • WS 2018 • Jörg Franke, Jan Niehues, Alex Waibel
Deep learning models are often not easily adaptable to new tasks and require task-specific adjustments.
1 code implementation • 26 Mar 2018 • Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel
Self-attention is a method of encoding sequences of vectors by relating these vectors to each-other based on pairwise similarities.
1 code implementation • LREC 2018 • Patrick Huber, Jan Niehues, Alex Waibel
We present a new approach to evaluate computational models for the task of text understanding by the means of out-of-context error detection.
no code implementations • 6 Mar 2018 • Stefan Constantin, Jan Niehues, Alex Waibel
Furthermore, by using a feedforward neural network, we are able to generate the output word by word and are no longer restricted to a fixed number of possible response candidates.
1 code implementation • IWSLT 2017 • Thanh-Le Ha, Jan Niehues, Alexander Waibel
In this paper, we proposed two strategies which can be applied to a multilingual neural machine translation system in order to better tackle zero-shot scenarios despite not having any parallel corpus.
no code implementations • 15 Sep 2017 • Matthias Sperber, Graham Neubig, Jan Niehues, Satoshi Nakamura, Alex Waibel
We investigate the problem of manually correcting errors from an automatic speech transcript in a cost-sensitive fashion.
no code implementations • WS 2017 • Jan-Thorsten Peter, Hermann Ney, Ond{\v{r}}ej Bojar, Ngoc-Quan Pham, Jan Niehues, Alex Waibel, Franck Burlot, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Valters {\v{S}}ics, Jasmijn Bastings, Miguel Rios, Wilker Aziz, Philip Williams, Fr{\'e}d{\'e}ric Blain, Lucia Specia
no code implementations • 15 Aug 2017 • Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel
The CTC loss function maps an input sequence of observable feature vectors to an output sequence of symbols.
no code implementations • WS 2017 • Jan Niehues, Eunah Cho
Linguistic resources such as part-of-speech (POS) tags have been extensively used in statistical machine translation (SMT) frameworks and have yielded better performances.
no code implementations • WS 2017 • Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel
By separating the search space and the modeling using $n$-best list reranking, we analyze the influence of both parts of an NMT system independently.
no code implementations • EMNLP 2017 • Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel
In this work, we extend the TreeLSTM (Tai et al., 2015) into a LatticeLSTM that is able to consume word lattices, and can be used as encoder in an attentional encoder-decoder model.
no code implementations • COLING 2016 • Matthias Sperber, Graham Neubig, Jan Niehues, Sebastian St{\"u}ker, Alex Waibel
Evaluating the quality of output from language processing systems such as machine translation or speech recognition is an essential step in ensuring that they are sufficient for practical use.
no code implementations • IWSLT 2016 • Thanh-Le Ha, Jan Niehues, Alexander Waibel
In this paper, we present our first attempts in building a multilingual Neural Machine Translation framework under a unified approach.
no code implementations • COLING 2016 • Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel
We analyzed the influence of the quality of the initial system on the final result.
no code implementations • WS 2016 • Jan-Thorsten Peter, Tamer Alkhouli, Hermann Ney, Matthias Huck, Fabienne Braune, Alex Fraser, er, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Barry Haddow, Rico Sennrich, Fr{\'e}d{\'e}ric Blain, Lucia Specia, Jan Niehues, Alex Waibel, Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Elena Knyazeva, Thomas Lavergne, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Stella Frank
Ranked #12 on
Machine Translation
on WMT2016 English-Romanian
no code implementations • NAACL 2016 • Markus M{\"u}ller, Thai Son Nguyen, Jan Niehues, Eunah Cho, Bastian Kr{\"u}ger, Thanh-Le Ha, Kevin Kilgour, Matthias Sperber, Mohammed Mediani, Sebastian St{\"u}ker, Alex Waibel
no code implementations • 28 Apr 2015 • Thanh-Le Ha, Jan Niehues, Alex Waibel
In this paper we combine the advantages of a model using global source sentence contexts, the Discriminative Word Lexicon, and neural networks.
no code implementations • LREC 2014 • Teresa Herrmann, Jan Niehues, Alex Waibel
However, it is a crucial aspect for humans when deciding on translation quality.
no code implementations • LREC 2012 • Marcello Federico, Sebastian St{\"u}ker, Luisa Bentivogli, Michael Paul, Mauro Cettolo, Teresa Herrmann, Jan Niehues, Giovanni Moretti
We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series.