Search Results for author: Albert Zeyer

Found 19 papers, 10 papers with code

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

no code implementations15 Sep 2023 Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks.

speech-recognition Speech Recognition

Monotonic segmental attention for automatic speech recognition

1 code implementation26 Oct 2022 Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney

We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming.

Automatic Speech Recognition Automatic Speech Recognition (ASR)

Why does CTC result in peaky behavior?

1 code implementation31 May 2021 Albert Zeyer, Ralf Schlüter, Hermann Ney

The peaky behavior of CTC models is well known experimentally.

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

no code implementations13 Apr 2021 Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A study of latent monotonic attention variants

no code implementations30 Mar 2021 Albert Zeyer, Ralf Schlüter, Hermann Ney

We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model.

Hard Attention speech-recognition +1

A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

1 code implementation19 May 2020 Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney

Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e. g. byte-pair encoding (BPE).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A New Training Pipeline for an Improved Neural Transducer

1 code implementation19 May 2020 Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training.

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

1 code implementation19 Dec 2019 Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney

We achieve improvements of up to 33% relative in word-error-rate (WER) over a strong baseline with data-augmentation in a low-resource environment (LibriSpeech-100h), closing the gap to a comparable oracle experiment by more than 50\%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Language Modeling with Deep Transformers

no code implementations10 May 2019 Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney

We explore deep autoregressive Transformer models in language modeling for speech recognition.

Language Modelling speech-recognition +1

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation

2 code implementations8 May 2019 Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

3 code implementations ACL 2018 Albert Zeyer, Tamer Alkhouli, Hermann Ney

We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder.

speech-recognition Speech Recognition +1

Improved training of end-to-end attention models for speech recognition

14 code implementations8 May 2018 Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.

Ranked #43 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Language Modelling Speech Recognition

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

no code implementations22 Jun 2016 Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney

On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.