no code implementations • 23 Feb 2024 • Jintao Jiang, Yingbo Gao, Mohammad Zeineldeen, Zoltan Tuske
In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training.
no code implementations • 15 Sep 2023 • Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney
We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks.
no code implementations • 8 Jun 2023 • Christian Herold, Yingbo Gao, Mohammad Zeineldeen, Hermann Ney
The integration of language models for neural machine translation has been extensively studied in the past.
1 code implementation • 6 Jun 2023 • Parnia Bahar, Mattia Di Gangi, Nick Rossenbach, Mohammad Zeineldeen
Automatic Arabic diacritization is useful in many applications, ranging from reading support for language learners to accurate pronunciation predictor for downstream tasks like speech synthesis.
no code implementations • 10 Mar 2023 • Mohammad Zeineldeen, Kartik Audhkhasi, Murali Karthick Baskar, Bhuvana Ramabhadran
Soft distillation is another popular KD method that distills the output logits of the teacher model.
no code implementations • 11 Jan 2023 • Christoph Lüscher, Jingjing Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney
By further adding neural speaker embeddings, we gain additional ~3% relative WER improvement on Hub5'00.
no code implementations • 11 Nov 2022 • Wei Zhou, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney
Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training.
no code implementations • 24 Oct 2022 • Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Tina Raissi, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney
Language barriers present a great challenge in our increasingly connected and global world.
no code implementations • 26 Jun 2022 • Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney
In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Nov 2021 • Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney
The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 18 Oct 2021 • Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney
We show on the LibriSpeech (LBS) and Switchboard (SWB) corpora that the model scales for a combination of attentionbased encoder-decoder acoustic model and language model can be learned as effectively as with manual tuning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 19 Apr 2021 • Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney
Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 12 Apr 2021 • Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
no code implementations • 12 Apr 2021 • Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney
We achieve a final word-error-rate of 3. 3%/10. 0% with a hybrid system on the clean/noisy test-sets, surpassing any previous state-of-the-art systems on Librispeech-100h that do not include unlabeled audio data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 May 2020 • Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney
Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e. g. byte-pair encoding (BPE).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1