no code implementations • 20 Aug 2024 • Xucheng Wan, Naijun Zheng, Kai Liu, Huan Zhou
Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available.
no code implementations • 14 Jun 2024 • Naijun Zheng, Xucheng Wan, Kai Liu, Ziqing Du, Zhou Huan
Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 May 2024 • Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie
Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 8 Apr 2024 • He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie
Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.
no code implementations • 9 Mar 2023 • Kai Liu, Ziqing Du, Xucheng Wan, Huan Zhou
To mitigate the imperative SC issue, we reformulate the training objective and propose two novel loss schemes that explore the metric of reconstruction improvement performance defined at small chunk-level and leverage the metric associated distribution information.
no code implementations • 16 Jan 2023 • Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou
As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker.
no code implementations • 24 Sep 2022 • Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou
Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion.
no code implementations • 24 Sep 2022 • Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou
To validate the effectiveness of our proposed model, extensive experiments are conducted on the DNS2020 dataset.