This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit. The proposed system adopts generalized eigenvalue beamforming with bidirectional long short-term memory (LSTM) mask estimation. In addition, the proposed baseline recipe includes four different speech enhancement measures, short-time objective intelligibility measure (STOI), extended STOI (eSTOI), perceptual evaluation of speech quality (PESQ) and speech distortion ratio (SDR) for the simulation test set.
|Task||Dataset||Model||Metric name||Metric value||Global rank||Compare|
|Distant Speech Recognition||CHiME-4 real 6ch||HMM-TDNN(LFMMI) + LSTMLM + NN-GEV||Word Error Rate (WER)||2.74||# 1|
|Noisy Speech Recognition||CHiME real||HMM-TDNN(LFMMI) + LSTMLM||Percentage error||11.4||# 1|