no code implementations • 8 Apr 2022 • Nick J. C. Wang, Zongfeng Quan, Shaojun Wang, Jing Xiao
The Conformer model is an excellent architecture for speech recognition modeling that effectively utilizes the hybrid losses of connectionist temporal classification (CTC) and attention to train model parameters.
no code implementations • 8 Apr 2022 • Nick J. C. Wang, Shaojun Wang, Jing Xiao
In this paper, we compare different ways to combine ASR and NLU, in particular using a single Conformer model with different ways to use its components, to better understand the strengths and weaknesses of each approach.
no code implementations • 7 Apr 2022 • Nick J. C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang
We revisit ideas presented by Lugosch et al. using speech pre-training and three-module modeling; however, to ease construction of the end-to-end SLU model, we use as our phoneme module an open-source acoustic-phonetic model from a DNN-HMM hybrid automatic speech recognition (ASR) system instead of training one from scratch.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4