no code implementations • 13 Mar 2024 • Wenjing Zhu, Sining Sun, Changhao Shan, Peng Fan, Qing Yang
Conformer-based attention models have become the de facto backbone model for Automatic Speech Recognition tasks.
no code implementations • 16 Dec 2023 • Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang
However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network.
1 code implementation • 23 Oct 2023 • Peng Fan, Changhao Shan, Sining Sun, Qing Yang, Jianwei Zhang
Following the initial encoder, we introduce an intermediate CTC loss function to compute the label frame, enabling us to extract the key frames and blank frames for KFSA.
no code implementations • 21 May 2023 • Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie
Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system.
no code implementations • 17 Jan 2023 • Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie
In this paper, we propose an efficient approach to obtain a high quality contextual list for a unified streaming/non-streaming based E2E model.
no code implementations • 3 Jul 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma
Then, during the training of the conversational ASR system, the extractor will be frozen to extract the textual representation of preceding speech, while such representation is used as context fed to the ASR decoder through attention mechanism.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Feb 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma
Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 15 Sep 2021 • Songjun Cao, Yueteng Kang, Yanzhe Fu, Xiaoshuo Xu, Sining Sun, Yike Zhang, Long Ma
Under such a framework, the neural network is usually pre-trained with massive unlabeled data and then fine-tuned with limited labeled data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 May 2020 • Baiji Liu, Songjun Cao, Sining Sun, Weibin Zhang, Long Ma
Experiments on AISHELL-1 data show that the proposed model, along with the training strategies, improve the character error rate (CER) of MoChA from 8. 96% to 7. 68% on test set.
no code implementations • 7 Jun 2018 • Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie
This paper explores the use of adversarial examples in training speech recognition systems to increase robustness of deep neural network acoustic models.
no code implementations • 7 Jun 2018 • Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie
In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem.
1 code implementation • 27 Mar 2018 • Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie
First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset.
no code implementations • MediaEval 2015 Workshop 2015 • Jingyong Hou, Van Tung Pham, Cheung-Chi Leung, Lei Wang, HaiHua Xu, Hang Lv, Lei Xie, Zhonghua Fu, Chongjia Ni, Xiong Xiao, Hongjie Chen, Shaofei Zhang, Sining Sun, Yougen Yuan, Pengcheng Li, Tin Lay Nwe, Sunil Sivadas, Bin Ma, Eng Siong Chng, Haizhou Li
This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation.
Ranked #9 on Keyword Spotting on QUESST