no code implementations • 26 Jul 2023 • Tian-Hao Zhang, Dinghao Zhou, Guiping Zhong, Jiaming Zhou, Baoxiang Li
RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence.
no code implementations • 1 Jul 2022 • Song Zhang, Ken Zheng, Xiaoxu Zhu, Baoxiang Li
Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to solve the problem of polyphone disambiguation, which is to pick up the correct pronunciation for several candidates for a Chinese polyphonic character.
no code implementations • 29 Mar 2022 • Jingyu Sun, Guiping Zhong, Dinghao Zhou, Baoxiang Li
In order to improve the performance of the streaming model and reduce the computational complexity, a frame-level model using efficient augment memory transformer block and dynamic latency training method is employed for streaming automatic speech recognition in this paper.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1