no code implementations • 2 Nov 2022 • Rao Ma, Xiaobo Wu, Jin Qiu, Yanan Qin, HaiHua Xu, Peihao Wu, Zejun Ma
The proposed method can achieve significantly better performance on the target test sets while it gets minimal performance degradation on the general test set, compared with both shallow and ILME-based LM fusion methods.
1 code implementation • 1 Nov 2022 • Yuhang Yang, HaiHua Xu, Hao Huang, Eng Siong Chng, Sheng Li
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more unpaired text data by multi-modal training, one needs to address two problems: 1) the synchronicity of feature sampling rates between speech and language (aka text data); 2) the homogeneity of the learned representations from two encoders.
no code implementations • 28 Oct 2022 • Yist Y. Lin, Tao Han, HaiHua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma
One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched.
1 code implementation • 26 Oct 2022 • Hexin Liu, HaiHua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur
The comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 9 Jul 2022 • Jicheng Zhang, Yizhou Peng, HaiHua Xu, Yi He, Eng Siong Chng, Hao Huang
Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks.
no code implementations • 9 Jul 2022 • Yizhou Peng, Yufei Liu, Jicheng Zhang, HaiHua Xu, Yi He, Hao Huang, Eng Siong Chng
More importantly, we train an end-to-end (E2E) speech recognition model by means of merging two monolingual data sets and observe the efficacy of the proposed ILME-based LM fusion for CSSR.
no code implementations • 26 Jan 2022 • Yufei Liu, Rao Ma, HaiHua Xu, Yi He, Zejun Ma, Weibin Zhang
In this paper we propose two novel approaches to estimate the ILM based on Listen-Attend-Spell (LAS) framework.
no code implementations • 7 Oct 2021 • Yizhou Peng, Jicheng Zhang, HaiHua Xu, Hao Huang, Eng Siong Chng
Non-autoregressive end-to-end ASR framework might be potentially appropriate for code-switching recognition task thanks to its inherent property that present output token being independent of historical ones.
no code implementations • 22 Jul 2021 • Duo Ma, Nana Hou, Van Tung Pham, HaiHua Xu, Eng Siong Chng
One of the advantage of the proposed method is that the entire system can be trained from scratch.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 15 Jun 2021 • Jicheng Zhang, Yizhou Peng, Pham Van Tung, HaiHua Xu, Hao Huang, Eng Siong Chng
In this paper, we propose a single multi-task learning framework to perform End-to-End (E2E) speech recognition (ASR) and accent recognition (AR) simultaneously.
no code implementations • 22 Oct 2020 • Yizhou Peng, Jicheng Zhang, Haobo Zhang, HaiHua Xu, Hao Huang, Eng Siong Chng
Experimental results on an 8-accent English speech recognition show both methods can yield WERs close to the conventional ASR systems that completely ignore the accent, as well as desired AR accuracy.
no code implementations • MediaEval 2015 Workshop 2015 • Jingyong Hou, Van Tung Pham, Cheung-Chi Leung, Lei Wang, HaiHua Xu, Hang Lv, Lei Xie, Zhonghua Fu, Chongjia Ni, Xiong Xiao, Hongjie Chen, Shaofei Zhang, Sining Sun, Yougen Yuan, Pengcheng Li, Tin Lay Nwe, Sunil Sivadas, Bin Ma, Eng Siong Chng, Haizhou Li
This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation.
Ranked #9 on
Keyword Spotting
on QUESST
no code implementations • 16 Oct 2014 • Peng Yang, HaiHua Xu, Xiong Xiao, Lei Xie, Cheung-Chi Leung, Hongjie Chen, JIA YU, Hang Lv, Lei Wang, Su Jun Leow, Bin Ma, Eng Siong Chng, Haizhou Li
For both symbolic and DTW search, partial sequence matching is performed to reduce missing rate, especially for query type 2 and 3.
Ranked #6 on
Keyword Spotting
on QUESST