no code implementations • 23 Feb 2023 • Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng
Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data.
no code implementations • 23 Feb 2023 • Chen Chen, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng
Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm.
1 code implementation • 22 Feb 2023 • Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng
To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.
1 code implementation • 22 Feb 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng
In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 19 Feb 2023 • Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng
This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario.
1 code implementation • 16 Feb 2023 • Shangeth Rajaa, Kriti Anandan, Swaraj Dalmia, Tarun Gupta, Eng Siong Chng
The pre-trained multi-lingual XLSR model generalizes well for language identification after fine-tuning on unseen languages.
no code implementations • 10 Dec 2022 • Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng
Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition.
1 code implementation • 1 Nov 2022 • Yuhang Yang, HaiHua Xu, Hao Huang, Eng Siong Chng, Sheng Li
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more unpaired text data by multi-modal training, one needs to address two problems: 1) the synchronicity of feature sampling rates between speech and language (aka text data); 2) the homogeneity of the learned representations from two encoders.
no code implementations • 1 Aug 2022 • Jia Qi Yip, Dianwen Ng, Bin Ma, Konstantin Pervushin, Eng Siong Chng
Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development.
1 code implementation • 15 Jul 2022 • Yang Xiao, Xubo Liu, James King, Arshdeep Singh, Eng Siong Chng, Mark D. Plumbley, Wenwu Wang
Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification.
no code implementations • 9 Jul 2022 • Yizhou Peng, Yufei Liu, Jicheng Zhang, HaiHua Xu, Yi He, Hao Huang, Eng Siong Chng
More importantly, we train an end-to-end (E2E) speech recognition model by means of merging two monolingual data sets and observe the efficacy of the proposed ILME-based LM fusion for CSSR.
no code implementations • 9 Jul 2022 • Jicheng Zhang, Yizhou Peng, HaiHua Xu, Yi He, Eng Siong Chng, Hao Huang
Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks.
no code implementations • 29 Jun 2022 • Andrew Koh, Eng Siong Chng
In this paper, we tackle the new Language-Based Audio Retrieval task proposed in DCASE 2022.
no code implementations • 13 Apr 2022 • Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng
Although automatic speech recognition (ASR) task has gained remarkable success by sequence-to-sequence models, there are two main mismatches between its training and testing that might lead to performance degradation: 1) The typically used cross-entropy criterion aims to maximize log-likelihood of the training data, while the performance is evaluated by word error rate (WER), not log-likelihood; 2) The teacher-forcing method leads to the dependence on ground truth during training, which means that model has never been exposed to its own prediction before testing.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 30 Mar 2022 • Yang Xiao, Nana Hou, Eng Siong Chng
Catastrophic forgetting is a thorny challenge when updating keyword spotting (KWS) models after deployment.
no code implementations • 29 Mar 2022 • Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng
Noise-robust speech recognition systems require large amounts of training data including noisy speech data and corresponding transcripts to achieve state-of-the-art performances in face of various practical environments.
no code implementations • 29 Mar 2022 • Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng
Automated Audio captioning (AAC) is a cross-modal task that generates natural language to describe the content of input audio.
1 code implementation • 29 Mar 2022 • Heqing Zou, Yuke Si, Chen Chen, Deepu Rajan, Eng Siong Chng
In this paper, we propose an end-to-end speech emotion recognition system using multi-level acoustic information with a newly designed co-attention module.
1 code implementation • 28 Mar 2022 • Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng
To alleviate this, we propose a dual-path style learning approach for end-to-end noise-robust automatic speech recognition (DPSL-ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 21 Feb 2022 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li
Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.
1 code implementation • EMNLP 2021 • Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma
For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 11 Oct 2021 • Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng
Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 7 Oct 2021 • Yizhou Peng, Jicheng Zhang, HaiHua Xu, Hao Huang, Eng Siong Chng
Non-autoregressive end-to-end ASR framework might be potentially appropriate for code-switching recognition task thanks to its inherent property that present output token being independent of historical ones.
no code implementations • 10 Aug 2021 • Andrew Koh, Fuzhao Xue, Eng Siong Chng
In this paper, we examine the use of Transfer Learning using Pretrained Audio Neural Networks (PANNs), and propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task.
no code implementations • 22 Jul 2021 • Duo Ma, Nana Hou, Van Tung Pham, HaiHua Xu, Eng Siong Chng
One of the advantage of the proposed method is that the entire system can be trained from scratch.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 15 Jun 2021 • Jicheng Zhang, Yizhou Peng, Pham Van Tung, HaiHua Xu, Hao Huang, Eng Siong Chng
In this paper, we propose a single multi-task learning framework to perform End-to-End (E2E) speech recognition (ASR) and accent recognition (AR) simultaneously.
no code implementations • 13 Jan 2021 • Manav Kaushik, Van Tung Pham, Eng Siong Chng
In this work, we propose a novel approach of using attention mechanism to build an end-to-end architecture for height and age estimation.
1 code implementation • 27 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Jinjie Ni, Eng Siong Chng
Dialogue relation extraction (RE) is to predict the relation type of two entities mentioned in a dialogue.
Ranked #8 on
Dialog Relation Extraction
on DialogRE
1 code implementation • 12 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng
Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence.
Ranked #4 on
Dialog Relation Extraction
on DialogRE
(F1c (v1) metric)
no code implementations • 19 Nov 2020 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li
Speaker extraction requires a sample speech from the target speaker as the reference.
no code implementations • 22 Oct 2020 • Yizhou Peng, Jicheng Zhang, Haobo Zhang, HaiHua Xu, Hao Huang, Eng Siong Chng
Experimental results on an 8-accent English speech recognition show both methods can yield WERs close to the conventional ASR systems that completely ignore the accent, as well as desired AR accuracy.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Boon Peng Yap, Andrew Koh, Eng Siong Chng
Domain adaptation or transfer learning using pre-trained language models such as BERT has proven to be an effective approach for many natural language processing tasks.
no code implementations • 21 May 2020 • Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma
To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture.
no code implementations • 18 May 2020 • Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Hao Huang, Eng Siong Chng
In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance.
no code implementations • 10 May 2020 • Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li
To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.
Audio and Speech Processing Sound
no code implementations • 29 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
The inaccuracy of phase estimation is inherent to the frequency domain processing, that affects the quality of signal reconstruction.
Audio and Speech Processing Sound
1 code implementation • 17 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
Inspired by Conv-TasNet, we propose a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.
no code implementations • 25 Nov 2019 • Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li
To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
no code implementations • 8 Apr 2019 • Yerbolat Khassanov, Hai-Hua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma
The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 8 Apr 2019 • Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng
However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates.
1 code implementation • 24 Mar 2019 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
The SpeakerBeam-FE (SBF) method is proposed for speaker extraction.
1 code implementation • 1 Nov 2018 • Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng, Haizhou Li
Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.
no code implementations • WS 2018 • Zhongwei Li, Xuancong Wang, Ai Ti Aw, Eng Siong Chng, Haizhou Li
Customized translation need pay spe-cial attention to the target domain ter-minology especially the named-entities for the domain.
no code implementations • 27 Jun 2018 • Yerbolat Khassanov, Eng Siong Chng
Additionally, we propose to generate the list of OOS words to expand vocabulary in unsupervised manner by automatically extracting them from ASR output.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 16 Jun 2018 • Pengcheng Guo, Hai-Hua Xu, Lei Xie, Eng Siong Chng
In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data.
no code implementations • 9 Feb 2016 • Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li
To simulate the real-life scenarios, we perform a preliminary investigation of spoofing detection under additive noisy conditions, and also describe an initial database for this task.
no code implementations • MediaEval 2015 Workshop 2015 • Jingyong Hou, Van Tung Pham, Cheung-Chi Leung, Lei Wang, HaiHua Xu, Hang Lv, Lei Xie, Zhonghua Fu, Chongjia Ni, Xiong Xiao, Hongjie Chen, Shaofei Zhang, Sining Sun, Yougen Yuan, Pengcheng Li, Tin Lay Nwe, Sunil Sivadas, Bin Ma, Eng Siong Chng, Haizhou Li
This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation.
Ranked #9 on
Keyword Spotting
on QUESST
no code implementations • 16 Oct 2014 • Peng Yang, HaiHua Xu, Xiong Xiao, Lei Xie, Cheung-Chi Leung, Hongjie Chen, JIA YU, Hang Lv, Lei Wang, Su Jun Leow, Bin Ma, Eng Siong Chng, Haizhou Li
For both symbolic and DTW search, partial sequence matching is performed to reduce missing rate, especially for query type 2 and 3.
Ranked #6 on
Keyword Spotting
on QUESST