no code implementations • 26 May 2022 • Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Li-Rong Dai
Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Feb 2022 • Zi-Qiang Zhang, Jie Zhang, Jian-Shu Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai
The proposed approach explores both the complementarity of audio-visual modalities and long-term context dependency using a transformer-based fusion module and a flexible masking strategy.
no code implementations • 22 Jan 2022 • Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai
In this work, we therefore first analyze the noise robustness of wav2vec2. 0 via experiments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Mar 2021 • Zi-Qiang Zhang, Yan Song, Ming-Hui Wu, Xin Fang, Li-Rong Dai
In this paper, we propose a weakly supervised multilingual representation learning framework, called cross-lingual self-training (XLST).