Deep Recurrent Neural Networks for Acoustic Modelling

7 Apr 2015  ·  William Chan, Ian Lane ·

We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM then generates a context from the sequence acoustic signal, and the final DNN takes the context and models the posterior probabilities of the acoustic states. We achieve a 3.47 WER on the Wall Street Journal (WSJ) eval92 task or more than 8% relative improvement over the baseline DNN models.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Speech Recognition WSJ eval92 TC-DNN-BLSTM-DNN Word Error Rate (WER) 3.5 # 9

Methods