no code implementations • 30 Aug 2024 • Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu
Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Jul 2024 • Thomas Cherico Wanger, Estelle Raveloaritiana, Siyan Zeng, Haixiu Gao, Xueqing He, Yiwen Shao, Panlong Wu, Kris A. G. Wyckhuys, Wenwu Zhou, Yi Zou, Zengrong Zhu, Ling Li, Haiyan Cen, Yunhui Liu, Shenggen Fan
While a strategic choice and management of crop and livestock species can improve nutrition, the environmental and production benefits of agricultural diversification are currently not well interlinked with Chinas food and nutrition security discussions.
no code implementations • 13 Jun 2024 • Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur
In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 31 Oct 2023 • Yiwen Shao, Shi-Xiong Zhang, Dong Yu
Automatic speech recognition (ASR) on multi-talker recordings is challenging.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 25 Oct 2023 • Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu
2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.
no code implementations • 5 Oct 2023 • Yiwen Shao
Multi-channel multi-talker speech recognition presents formidable challenges in the realm of speech processing, marked by issues such as background noise, reverberation, and overlapping speech.
no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak
We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 Nov 2021 • Yiwen Shao, Shi-Xiong Zhang, Dong Yu
Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Mar 2021 • Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur
We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.
1 code implementation • 20 May 2020 • Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 14 Feb 2020 • Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur
Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.
1 code implementation • 18 Sep 2019 • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur
We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.
Ranked #1 on Speech Recognition on Hub5'00 CallHome
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6