Search Results for author: Ralf Schlüter

Found 56 papers, 10 papers with code

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

no code implementations12 Oct 2023 Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter

Synthetic data generated by text-to-speech (TTS) systems can be used to improve automatic speech recognition (ASR) systems in low-resource or domain mismatch tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers

no code implementations11 Oct 2023 Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers.

End-to-End Training of a Neural HMM with Label and Transition Probabilities

1 code implementation4 Oct 2023 Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney

We investigate recognition results and additionally Viterbi alignments of our models.

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

no code implementations25 Sep 2023 Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Empirically, we show that ILM subtraction and sequence discriminative training achieve similar performance across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and limited context.

Language Modelling Relation +2

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

no code implementations15 Sep 2023 Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks.

speech-recognition Speech Recognition

Comparative Analysis of the wav2vec 2.0 Feature Extractor

no code implementations8 Aug 2023 Peter Vieting, Ralf Schlüter, Hermann Ney

In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

no code implementations28 May 2023 Wei Zhou, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney

Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only.

Sequence-To-Sequence Speech Recognition speech-recognition

End-to-End Speech Recognition: A Survey

no code implementations3 Mar 2023 Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe

In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Analyzing And Improving Neural Speaker Embeddings for ASR

no code implementations11 Jan 2023 Christoph Lüscher, Jingjing Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

By further adding neural speaker embeddings, we gain additional ~3% relative WER improvement on Hub5'00.

Speaker Verification

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

no code implementations7 Dec 2022 Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Enhancing and Adversarial: Improve ASR with Speaker Labels

no code implementations11 Nov 2022 Wei Zhou, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney

Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training.

Multi-Task Learning

Monotonic segmental attention for automatic speech recognition

1 code implementation26 Oct 2022 Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney

We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming.

Automatic Speech Recognition Automatic Speech Recognition (ASR)

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

no code implementations26 Jun 2022 Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney

In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Efficient Training of Neural Transducer for Speech Recognition

no code implementations22 Apr 2022 Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney

In this work, we propose an efficient 3-stage progressive training pipeline to build highly-performing neural transducer models from scratch with very limited computation resources in a reasonable short time period.

speech-recognition Speech Recognition

Self-Normalized Importance Sampling for Neural Language Modeling

no code implementations11 Nov 2021 Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney

Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Conformer-based Hybrid ASR System for Switchboard Dataset

no code implementations5 Nov 2021 Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney

The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Automatic Learning of Subword Dependent Model Scales

no code implementations18 Oct 2021 Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

We show on the LibriSpeech (LBS) and Switchboard (SWB) corpora that the model scales for a combination of attentionbased encoder-decoder acoustic model and language model can be learned as effectively as with manual tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Efficient Sequence Training of Attention Models using Approximative Recombination

no code implementations18 Oct 2021 Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney

Lastly, it is shown that this technique can be used to effectively perform sequence discriminative training for attention-based encoder-decoder acoustic models on the LibriSpeech task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

On Language Model Integration for RNN Transducer based Speech Recognition

no code implementations13 Oct 2021 Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney

In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework.

Language Modelling speech-recognition +1

Why does CTC result in peaky behavior?

1 code implementation31 May 2021 Albert Zeyer, Ralf Schlüter, Hermann Ney

The peaky behavior of CTC models is well known experimentally.

On Sampling-Based Training Criteria for Neural Language Modeling

no code implementations21 Apr 2021 Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney

As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

no code implementations19 Apr 2021 Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney

Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech

no code implementations17 Apr 2021 Yu Qiao, Wei Zhou, Elma Kerz, Ralf Schlüter

In recent years, automated approaches to assessing linguistic complexity in second language (L2) writing have made significant progress in gauging learner performance, predicting human ratings of the quality of learner productions, and benchmarking L2 development.

Benchmarking

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

no code implementations13 Apr 2021 Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

no code implementations12 Apr 2021 Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

We achieve a final word-error-rate of 3. 3%/10. 0% with a hybrid system on the clean/noisy test-sets, surpassing any previous state-of-the-art systems on Librispeech-100h that do not include unlabeled audio data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

On Architectures and Training for Raw Waveform Feature Extraction in ASR

no code implementations9 Apr 2021 Peter Vieting, Christoph Lüscher, Wilfried Michel, Ralf Schlüter, Hermann Ney

With the success of neural network based modeling in automatic speech recognition (ASR), many studies investigated acoustic modeling and learning of feature extractors directly based on the raw waveform.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A study of latent monotonic attention variants

no code implementations30 Mar 2021 Albert Zeyer, Ralf Schlüter, Hermann Ney

We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model.

Hard Attention speech-recognition +1

Tight Integrated End-to-End Training for Cascaded Speech Translation

no code implementations24 Nov 2020 Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney

Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system.

Translation

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

no code implementations30 Oct 2020 Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling.

Language Modelling speech-recognition +1

Investigation of Large-Margin Softmax in Neural Language Modeling

no code implementations20 May 2020 Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney

After that, we apply the best norm-scaling setup in combination with various margins and conduct neural language models rescoring experiments in automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

1 code implementation19 May 2020 Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney

Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e. g. byte-pair encoding (BPE).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A New Training Pipeline for an Improved Neural Transducer

1 code implementation19 May 2020 Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training.

Context-Dependent Acoustic Modeling without Explicit Phone Clustering

no code implementations15 May 2020 Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney

In this work, we address a direct phonetic context modeling for the hybrid deep neural network (DNN)/HMM, that does not build on any phone clustering algorithm for the determination of the HMM state inventory.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

no code implementations2 Apr 2020 Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney

We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus.

Data Augmentation

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

1 code implementation19 Dec 2019 Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney

We achieve improvements of up to 33% relative in word-error-rate (WER) over a strong baseline with data-augmentation in a low-resource environment (LibriSpeech-100h), closing the gap to a comparable oracle experiment by more than 50\%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

no code implementations1 Jul 2019 Eugen Beck, Wei Zhou, Ralf Schlüter, Hermann Ney

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models.

Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR

no code implementations1 Jul 2019 Wilfried Michel, Ralf Schlüter, Hermann Ney

This allows for a direct comparison of lattice-based and lattice-free sequence discriminative training criteria such as MMI and sMBR, both using the same language model during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cumulative Adaptation for BLSTM Acoustic Models

no code implementations14 Jun 2019 Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney

Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8\% relative improvement in word error rate on the NIST Hub5 2000 evaluation test set.

Acoustic Modelling Automatic Speech Recognition +4

Language Modeling with Deep Transformers

no code implementations10 May 2019 Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney

We explore deep autoregressive Transformer models in language modeling for speech recognition.

Language Modelling speech-recognition +1

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

no code implementations9 May 2019 Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation

2 code implementations8 May 2019 Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

no code implementations19 Jun 2018 Tobias Menne, Ralf Schlüter, Hermann Ney

The proposed adaptation approach is based on the integration of the beamformer, which includes the mask estimation network, and the acoustic model of the ASR system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

no code implementations22 Jun 2016 Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney

On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.