RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation

8 May 2019Christoph LüscherEugen BeckKazuki IrieMarkus KitzaWilfried MichelAlbert ZeyerRalf SchlüterHermann Ney

We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task. Detailed descriptions of the system development, including model design, pretraining schemes, training schedules, and optimization approaches are provided for both system architectures... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Speech Recognition LibriSpeech test-clean 6 layer BLSTM with LSTM transformer rescoring Word Error Rate (WER) 2.70 # 2
Speech Recognition LibriSpeech test-other 6 layer BLSTM with LSTM transformer rescoring Word Error Rate (WER) 5.70 # 1