Search Results for author: Mingyu Cui

Found 29 papers, 7 papers with code

Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition

no code implementations25 Dec 2024 Shujie Hu, Xurong Xie, Mengzhe Geng, Jiajun Deng, Zengrui Jin, Tianzi Wang, Mingyu Cui, Guinan Li, Zhaoqing Li, Helen Meng, Xunying Liu

Experiments on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest structured speaker-deficiency adaptation of HuBERT and Wav2vec2-conformer models consistently outperforms baseline SFMs using either: a) no adapters; b) global adapters shared among all speakers; or c) single attribute adapters modelling speaker or deficiency labels alone by statistically significant WER reductions up to 3. 01% and 1. 50% absolute (10. 86% and 6. 94% relative) on the two tasks respectively.

Attribute speech-recognition +1

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

no code implementations13 Nov 2024 Dingdong Wang, Mingyu Cui, Dongchao Yang, Xueyuan Chen, Helen Meng

With the rise of Speech Large Language Models (Speech LLMs), there has been growing interest in discrete speech tokens for their ability to integrate with text-based tokens seamlessly.

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC

1 code implementation19 Sep 2024 Jiawen Kang, Lingwei Meng, Mingyu Cui, Yuejiao Wang, Xixin Wu, Xunying Liu, Helen Meng

SACTC is a tailored CTC variant for multi-talker scenarios, it explicitly models speaker disentanglement by constraining the encoder to represent different speakers' tokens at specific time frames.

Disentanglement speech-recognition +1

Exploring SSL Discrete Tokens for Multilingual ASR

no code implementations13 Sep 2024 Mingyu Cui, Daxin Tan, Yifan Yang, Dingdong Wang, Huimeng Wang, Xiao Chen, Xie Chen, Xunying Liu

With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has been growing interest in utilizing discrete tokens generated by SSL for automatic speech recognition (ASR), as they offer faster processing techniques.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

no code implementations3 Jul 2024 Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

Experiments are conducted on four tasks: the English UASpeech and TORGO dysarthric speech corpora; and the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets.

Alzheimer's Disease Detection Self-Supervised Learning +2

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

2 code implementations17 Jun 2024 Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

Notably, ASR models trained on GigaSpeech 2 can reduce the word error rate for Thai, Indonesian, and Vietnamese on our challenging and realistic YouTube test set by 25% to 40% compared to Whisper large-v3, with merely 10% model parameters.

speech-recognition Speech Recognition

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

no code implementations14 Jun 2024 Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jin, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems.

Decoder

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

1 code implementation8 Jan 2024 Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

To the best of our knowledge, this work represents an early effort to integrate SIMO and SISO for multi-talker speech recognition.

Decoder speech-recognition +1

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

no code implementations6 Jul 2023 Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date.

Speech Dereverberation Speech Separation

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

no code implementations26 Jun 2023 Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu

Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies.

Diversity speech-recognition +2

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

no code implementations25 May 2023 Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Use of Speech Impairment Severity for Dysarthric Speech Recognition

no code implementations18 May 2023 Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu

A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity.

Diversity severity prediction +2

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

no code implementations28 Feb 2023 Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2. 0 models consistently outperform the standalone wav2vec2. 0 models by statistically significant WER reductions of 8. 22% and 3. 43% absolute (26. 71% and 15. 88% relative) on the two tasks respectively.

speech-recognition Speech Recognition

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

1 code implementation20 Feb 2023 Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng

Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains challenging.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

1 code implementation15 Feb 2023 Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu

Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

no code implementations15 Jun 2022 Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

no code implementations19 Mar 2022 Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Recent Progress in the CUHK Dysarthric Speech Recognition System

no code implementations15 Jan 2022 Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

TopicRefine: Joint Topic Prediction and Dialogue Response Generation for Multi-turn End-to-End Dialogue System

no code implementations11 Sep 2021 Hongru Wang, Mingyu Cui, Zimo Zhou, Gabriel Pui Cheong Fung, Kam-Fai Wong

A multi-turn dialogue always follows a specific topic thread, and topic shift at the discourse level occurs naturally as the conversation progresses, necessitating the model's ability to capture different topics and generate topic-aware responses.

Response Generation

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

no code implementations17 Jul 2020 Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng

Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.