End-to-end speech recognition using lattice-free MMI

Interspeech 2018 2018 Hossein HadianHossein SametiDaniel PoveySanjeev Khudanpur

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Speech Recognition Switchboard (300hr) End-to-end LF-MMI Word Error Rate (WER) 9.3 # 1
Speech Recognition WSJ eval92 End-to-end LF-MMI Percentage error 3.0 # 2
Speech Recognition WSJ eval92 End-to-end LF-MMI Word Error Rate (WER) 3.0 # 1