no code implementations • 14 Dec 2024 • Yang Liu, Li Wan, Yiteng Huang, Ming Sun, Yangyang Shi, Florian Metze
Deep learning models like Convolutional Neural Networks and transformers have shown impressive capabilities in speech verification, gaining considerable attention in the research community.
no code implementations • 17 Sep 2024 • Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia, Gil Keren, Egor Lakomkin, Yiteng Huang, Jacob Donley, Jay Mahadeokar, Ozlem Kalinli
For the conversational ASR task in particular, using only 8 hours of labeled speech, our model outperforms a supervised ASR baseline that is trained on 2000 hours of labeled data, which demonstrates the effectiveness of our approach.
no code implementations • 13 Sep 2024 • Anfeng Xu, Biqiao Zhang, Shuyu Kong, Yiteng Huang, Zhaojun Yang, Sangeeta Srivastava, Ming Sun
Keyword spotting (KWS) is an important speech processing component for smart devices with voice assistance capability.
no code implementations • 27 Aug 2024 • Zhenyu Wang, Shuyu Kong, Li Wan, Biqiao Zhang, Yiteng Huang, Mumin Jin, Ming Sun, Xin Lei, Zhaojun Yang
Existing keyword spotting (KWS) systems primarily rely on predefined keyword phrases.
no code implementations • 23 Aug 2024 • Zhenyu Wang, Li Wan, Biqiao Zhang, Yiteng Huang, Shang-Wen Li, Ming Sun, Xin Lei, Zhaojun Yang
A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before.
no code implementations • 18 Jan 2024 • Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide
Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 8 Jan 2024 • Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei
Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted.
no code implementations • 17 Feb 2023 • Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun
The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.
no code implementations • 9 Nov 2022 • Haichuan Yang, Zhaojun Yang, Li Wan, Biqiao Zhang, Yangyang Shi, Yiteng Huang, Ivaylo Enchev, Limin Tang, Raziel Alvarez, Ming Sun, Xin Lei, Raghuraman Krishnamoorthi, Vikas Chandra
This paper proposes a hardware-efficient architecture, Linearized Convolution Network (LiCo-Net) for keyword spotting.
no code implementations • 17 May 2022 • Joe Caroselli, Arun Narayanan, Yiteng Huang
First is the Context Aware Beamformer which uses the noise context and detected hotword to determine how to target the desired speaker.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2