no code implementations • 9 Mar 2024 • Yudong Yang, Rongfeng Su, Xiaokang Liu, Nan Yan, Lan Wang
In this model, the inherent acoustic characteristics of individuals related to the tongue motion details are encoded by using wav2vec 2. 0, while the ASR transcriptions related to the universality of tongue motions are encoded by using BERT.
no code implementations • 28 Mar 2022 • Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu
Accurate recognition of dysarthric and elderly speech remain challenging tasks to date.
no code implementations • 18 Aug 2021 • Jin Li, Rongfeng Su, Xurong Xie, Nan Yan, Lan Wang
The shallow stream is used to acquire traditional shallow features that is beneficial for the classification of phones or words while the deep stream is used to obtain utterance-level speaker-invariant deep features for improving the feature diversity.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2