Search Results for author: Eng Siong Chng

Found 69 papers, 29 papers with code

Aligning Speech to Languages to Enhance Code-switching Speech Recognition

no code implementations • 9 Mar 2024 • Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

Performance evaluation using large language models reveals the advantage of the linguistic hint by achieving 14. 1% and 5. 5% relative improvement on test sets of the ASRU and SEAME datasets, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

no code implementations • 16 Feb 2024 • Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks.

Denoising Speech Enhancement +1

Paper
Add Code

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

1 code implementation • 10 Feb 2024 • Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.

Machine Translation Translation

126

Paper
Code

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Adapting OpenAI's Whisper for Speech Recognition on Code-Switch Mandarin-English SEAME and ASRU2019 Datasets

no code implementations • 29 Nov 2023 • Yuhang Yang, Yizhou Peng, Xionghu Zhong, Hao Huang, Eng Siong Chng

The Mixed Error Rate results show that the amount of adaptation data may be as low as $1\sim10$ hours to achieve saturation in performance gain (SEAME) while the ASRU task continued to show performance with more adaptation data ($>$100 hours).

speech-recognition Speech Recognition

Paper
Add Code

Generative error correction for code-switching speech recognition using large language models

no code implementations • 17 Oct 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco Siniscalchi, Eng Siong Chng

In this work, we propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

1 code implementation • NeurIPS 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng

We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

SPGM: Prioritizing Local Features for enhanced speech separation performance

1 code implementation • 22 Sep 2023 • Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

Dual-path is a popular architecture for speech separation models (e. g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships.

Ranked #5 on Speech Separation on WSJ0-2mix

Speech Separation

Paper
Code

Codec Data Augmentation for Time-domain Heart Sound Classification

no code implementations • 14 Sep 2023 • Ansh Mishra, Jia Qi Yip, Eng Siong Chng

In this work, we propose a simple time domain approach, to the heart sound classification problem with a base classification error rate of 0. 8 and show that augmentation of the data through codec simulation can improve the classification error rate to 0. 2.

Classification Data Augmentation +1

Paper
Add Code

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

1 code implementation • 16 Jul 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner for guiding the reverse denoising process.

Denoising Multi-Task Learning +2

Paper
Code

Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition

1 code implementation • 18 Jun 2023 • Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng

In this work, we investigate the noise-invariant visual modality to strengthen robustness of AVSR, which can adapt to any testing noises while without dependence on noisy training data, a. k. a., unsupervised noise adaptation.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition

1 code implementation • 18 Jun 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng

In this paper, we aim to learn the shared representations across modalities to bridge their gap.

Audio-Visual Speech Recognition Representation Learning +3

Paper
Code

A Neural State-Space Model Approach to Efficient Speech Separation

1 code implementation • 26 May 2023 • Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng

In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).

Representation Learning Speech Separation

Paper
Code

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

1 code implementation • 20 May 2023 • Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling.

Speaker Verification

Paper
Code

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

1 code implementation • 16 May 2023 • Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, Eng Siong Chng

Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks.

Contrastive Learning Image-text Classification +2

Paper
Code

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

1 code implementation • 16 May 2023 • Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng

However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Paper
Code

Leveraging Audio-Tagging Assisted Sound Event Detection using Weakified Strong Labels and Frequency Dynamic Convolutions

no code implementations • 25 Apr 2023 • Tanmay Khandelwal, Rohan Kumar Das, Andrew Koh, Eng Siong Chng

Stage-1 of our proposed framework focuses on audio-tagging (AT), which assists the sound event detection (SED) system in Stage-2.

Audio Tagging Event Detection +1

Paper
Add Code

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

no code implementations • 11 Apr 2023 • Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng

Second, during finetuning we propose a Transformer-based code predictor to accurately predict clean codes by modeling the global dependency of input noisy representations, which enables discovery and restoration of high-quality clean representations without distortions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Unsupervised Noise adaptation using Data Simulation

no code implementations • 23 Feb 2023 • Chen Chen, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng

Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm.

Domain Adaptation Generative Adversarial Network +1

Paper
Add Code

Metric-oriented Speech Enhancement using Diffusion Probabilistic Model

no code implementations • 23 Feb 2023 • Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng

Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data.

Speech Enhancement

Paper
Add Code

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation

1 code implementation • 22 Feb 2023 • Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng

To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.

Multi-Task Learning Speech Enhancement +2

Paper
Code

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

1 code implementation • 22 Feb 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Probabilistic Back-ends for Online Speaker Recognition and Clustering

1 code implementation • 19 Feb 2023 • Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng

This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario.

Clustering Online Clustering +1

Paper
Code

Improving Spoken Language Identification with Map-Mix

1 code implementation • 16 Feb 2023 • Shangeth Rajaa, Kriti Anandan, Swaraj Dalmia, Tarun Gupta, Eng Siong Chng

The pre-trained multi-lingual XLSR model generalizes well for language identification after fine-tuning on unseen languages.

Data Augmentation Language Identification +1

Paper
Code

Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

no code implementations • 10 Dec 2022 • Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng

Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition.

Audio-Visual Speech Recognition reinforcement-learning +3

Paper
Add Code

Speech-text based multi-modal training with bidirectional attention for improved speech recognition

1 code implementation • 1 Nov 2022 • Yuhang Yang, HaiHua Xu, Hao Huang, Eng Siong Chng, Sheng Li

To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more unpaired text data by multi-modal training, one needs to address two problems: 1) the synchronicity of feature sampling rates between speech and language (aka text data); 2) the homogeneity of the learned representations from two encoders.

speech-recognition Speech Recognition

Paper
Code

Amino Acid Classification in 2D NMR Spectra via Acoustic Signal Embeddings

no code implementations • 1 Aug 2022 • Jia Qi Yip, Dianwen Ng, Bin Ma, Konstantin Pervushin, Eng Siong Chng

Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development.

Speaker Verification

Paper
Add Code

Continual Learning For On-Device Environmental Sound Classification

1 code implementation • 15 Jul 2022 • Yang Xiao, Xubo Liu, James King, Arshdeep Singh, Eng Siong Chng, Mark D. Plumbley, Wenwu Wang

Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification.

Classification Computational Efficiency +3

Paper
Code

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

no code implementations • 9 Jul 2022 • Yizhou Peng, Yufei Liu, Jicheng Zhang, HaiHua Xu, Yi He, Hao Huang, Eng Siong Chng

More importantly, we train an end-to-end (E2E) speech recognition model by means of merging two monolingual data sets and observe the efficacy of the proposed ILME-based LM fusion for CSSR.

Language Modelling speech-recognition +1

Paper
Add Code

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder

no code implementations • 9 Jul 2022 • Jicheng Zhang, Yizhou Peng, HaiHua Xu, Yi He, Eng Siong Chng, Hao Huang

Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks.

speech-recognition Speech Recognition

Paper
Add Code

Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss

no code implementations • 29 Jun 2022 • Andrew Koh, Eng Siong Chng

In this paper, we tackle the new Language-Based Audio Retrieval task proposed in DCASE 2022.

Retrieval

Paper
Add Code

Self-critical Sequence Training for Automatic Speech Recognition

no code implementations • 13 Apr 2022 • Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng

Although automatic speech recognition (ASR) task has gained remarkable success by sequence-to-sequence models, there are two main mismatches between its training and testing that might lead to performance degradation: 1) The typically used cross-entropy criterion aims to maximize log-likelihood of the training data, while the performance is evaluated by word error rate (WER), not log-likelihood; 2) The teacher-forcing method leads to the dependence on ground truth during training, which means that model has never been exposed to its own prediction before testing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting

1 code implementation • 30 Mar 2022 • Yang Xiao, Nana Hou, Eng Siong Chng

Catastrophic forgetting is a thorny challenge when updating keyword spotting (KWS) models after deployment.

Data Augmentation Incremental Learning +3

Paper
Code

Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information

1 code implementation • 29 Mar 2022 • Heqing Zou, Yuke Si, Chen Chen, Deepu Rajan, Eng Siong Chng

In this paper, we propose an end-to-end speech emotion recognition system using multi-level acoustic information with a newly designed co-attention module.

Speech Emotion Recognition

102

Paper
Code

Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data

no code implementations • 29 Mar 2022 • Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng

Noise-robust speech recognition systems require large amounts of training data including noisy speech data and corresponding transcripts to achieve state-of-the-art performances in face of various practical environments.

Generative Adversarial Network Robust Speech Recognition +1

Paper
Add Code

Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning

no code implementations • 29 Mar 2022 • Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng

Automated Audio captioning (AAC) is a cross-modal task that generates natural language to describe the content of input audio.

Audio captioning Contrastive Learning

Paper
Add Code

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

1 code implementation • 28 Mar 2022 • Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Then, we propose style learning to map the fused feature close to clean feature, in order to learn latent speech information from the latter, i. e., clean "speech style".

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

L-SpEx: Localized Target Speaker Extraction

1 code implementation • 21 Feb 2022 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Target Speaker Extraction

Paper
Code

A Unified Speaker Adaptation Approach for ASR

1 code implementation • EMNLP 2021 • Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma

For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

2 code implementations • 11 Oct 2021 • Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Minimum word error training for non-autoregressive Transformer-based code-switching ASR

no code implementations • 7 Oct 2021 • Yizhou Peng, Jicheng Zhang, HaiHua Xu, Hao Huang, Eng Siong Chng

Non-autoregressive end-to-end ASR framework might be potentially appropriate for code-switching recognition task thanks to its inherent property that present output token being independent of historical ones.

Paper
Add Code

Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization

no code implementations • 10 Aug 2021 • Andrew Koh, Fuzhao Xue, Eng Siong Chng

In this paper, we examine the use of Transfer Learning using Pretrained Audio Neural Networks (PANNs), and propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task.

Audio captioning Transfer Learning

Paper
Add Code

Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech

no code implementations • 22 Jul 2021 • Duo Ma, Nana Hou, Van Tung Pham, HaiHua Xu, Eng Siong Chng

One of the advantage of the proposed method is that the entire system can be trained from scratch.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

E2E-based Multi-task Learning Approach to Joint Speech and Accent Recognition

no code implementations • 15 Jun 2021 • Jicheng Zhang, Yizhou Peng, Pham Van Tung, HaiHua Xu, Hao Huang, Eng Siong Chng

In this paper, we propose a single multi-task learning framework to perform End-to-End (E2E) speech recognition (ASR) and accent recognition (AR) simultaneously.

Multi-Task Learning speech-recognition +1

Paper
Add Code

End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN

no code implementations • 13 Jan 2021 • Manav Kaushik, Van Tung Pham, Eng Siong Chng

In this work, we propose a novel approach of using attention mechanism to build an end-to-end architecture for height and age estimation.

Age Estimation Multi-Task Learning

Paper
Add Code

An Embarrassingly Simple Model for Dialogue Relation Extraction

1 code implementation • 27 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Jinjie Ni, Eng Siong Chng

Dialogue relation extraction (RE) is to predict the relation type of two entities mentioned in a dialogue.

Ranked #9 on Dialog Relation Extraction on DialogRE

Dialog Relation Extraction Relation +1

Paper
Code

GDPNet: Refining Latent Multi-View Graph for Relation Extraction

1 code implementation • 12 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng

Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence.

Ranked #4 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Dynamic Time Warping +2

Paper
Code

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

no code implementations • 19 Nov 2020 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction requires a sample speech from the target speaker as the reference.

Paper
Add Code

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework

no code implementations • 22 Oct 2020 • Yizhou Peng, Jicheng Zhang, Haobo Zhang, HaiHua Xu, Hao Huang, Eng Siong Chng

Experimental results on an 8-accent English speech recognition show both methods can yield WERs close to the conventional ASR systems that completely ignore the accent, as well as desired AR accuracy.

speech-recognition Speech Recognition +1

Paper
Add Code

Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Boon Peng Yap, Andrew Koh, Eng Siong Chng

Domain adaptation or transfer learning using pre-trained language models such as BERT has proven to be an effective approach for many natural language processing tasks.

Data Augmentation Domain Adaptation +3

Paper
Code

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

no code implementations • 21 May 2020 • Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma

To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture.

Cross-Lingual Transfer Language Modelling +1

Paper
Add Code

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

no code implementations • 18 May 2020 • Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Hao Huang, Eng Siong Chng

In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance.

Language Modelling

Paper
Add Code

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations • 10 May 2020 • Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Ranked #1 on Speech Extraction on WSJ0-2mix-extr

Speech Extraction Audio and Speech Processing Sound

Paper
Add Code

Time-domain speaker extraction network

no code implementations • 29 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

The inaccuracy of phase estimation is inherent to the frequency domain processing, that affects the quality of signal reconstruction.

Audio and Speech Processing Sound

Paper
Add Code

SpEx: Multi-Scale Time Domain Speaker Extraction Network

1 code implementation • 17 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

Inspired by Conv-TasNet, we propose a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.

Multi-Task Learning

144

Paper
Code

Independent language modeling architecture for end-to-end ASR

no code implementations • 25 Nov 2019 • Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li

To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).

Domain Adaptation Speaker Recognition

Paper
Add Code

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

no code implementations • 8 Apr 2019 • Yerbolat Khassanov, Hai-Hua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

1 code implementation • 8 Apr 2019 • Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng

However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates.

speech-recognition Speech Recognition

Paper
Code

Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss

1 code implementation • 24 Mar 2019 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

The SpeakerBeam-FE (SBF) method is proposed for speaker extraction.

144

Paper
Code

On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition

1 code implementation • 1 Nov 2018 • Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng, Haizhou Li

Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.

Data Augmentation Language Identification +3

Paper
Code

Named-Entity Tagging and Domain adaptation for Better Customized Translation

no code implementations • WS 2018 • Zhongwei Li, Xuancong Wang, Ai Ti Aw, Eng Siong Chng, Haizhou Li

Customized translation need pay spe-cial attention to the target domain ter-minology especially the named-entities for the domain.

Domain Adaptation Machine Translation +6

Paper
Add Code

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

no code implementations • 27 Jun 2018 • Yerbolat Khassanov, Eng Siong Chng

Additionally, we propose to generate the list of OOS words to expand vocabulary in unsupervised manner by automatically extracting them from ASR output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

no code implementations • 16 Jun 2018 • Pengcheng Guo, Hai-Hua Xu, Lei Xie, Eng Siong Chng

In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data.

speech-recognition Speech Recognition

Paper
Add Code

Spoofing detection under noisy conditions: a preliminary investigation and an initial database

no code implementations • 9 Feb 2016 • Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li

To simulate the real-life scenarios, we perform a preliminary investigation of spoofing detection under additive noisy conditions, and also describe an initial database for this task.

Speaker Verification