1 code implementation • 26 Mar 2025 • Yangyang Meng, Jinpeng Li, Guodong Lin, Yu Pu, Guanbo Wang, Hu Du, Zhiming Shao, YuKai Huang, Ke Li, Wei-Qiang Zhang
This report introduces Dolphin, a large-scale multilingual automatic speech recognition (ASR) model that extends the Whisper architecture to support a wider range of languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 12 Jan 2025 • Yu Pu, Wei-Qiang Zhang
By leveraging speech analysis as a non-invasive and cost-effective tool for AD detection, our research contributes to early diagnosis and improved management of this debilitating disease.
no code implementations • 30 Dec 2024 • Zixiang Wan, Ziyue Qiu, Yiyang Liu, Wei-Qiang Zhang
Speech Emotion Recognition (SER) involves analyzing vocal expressions to determine the emotional state of speakers, where the comprehensive and thorough utilization of audio information is paramount.
no code implementations • 30 Dec 2024 • Zhi Chen, Yun-Fei Shao, Yong Ma, Mingsheng Wei, Le Zhang, Wei-Qiang Zhang
Acoustic Scene Classification (ASC) identifies an environment based on an audio signal.
no code implementations • 11 Sep 2024 • Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang
Anomalous Sound Detection (ASD) has gained significant interest through the application of various Artificial Intelligence (AI) technologies in industrial settings.
no code implementations • 27 Aug 2024 • Anbai Jiang, Yuchen Shi, Pingyi Fan, Wei-Qiang Zhang, Jia Liu
Machine anomalous sound detection (ASD) has emerged as one of the most promising applications in the Industrial Internet of Things (IIoT) due to its unprecedented efficacy in mitigating risks of malfunctions and promoting production efficiency.
no code implementations • 10 Aug 2024 • Jinpeng Li, Yu Pu, Qi Sun, Wei-Qiang Zhang
It is worth researching how to utilize low-cost data to improve the performance of Whisper on under-represented languages.
2 code implementations • 17 Jun 2024 • Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen
Notably, ASR models trained on GigaSpeech 2 can reduce the word error rate for Thai, Indonesian, and Vietnamese on our challenging and realistic YouTube test set by 25% to 40% compared to Whisper large-v3, with merely 10% model parameters.
1 code implementation • 14 Jun 2024 • Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, Jian Li
As a robust and large-scale multilingual speech recognition model, Whisper has demonstrated impressive results in many low-resource and out-of-distribution scenarios.
1 code implementation • 13 Mar 2024 • Jiayu Du, Jinpeng Li, Guoguo Chen, Wei-Qiang Zhang
In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 6 Oct 2023 • Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang
Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD.
no code implementations • 2 Jun 2023 • Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan
Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks.
1 code implementation • 2 Jun 2023 • Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Jinfeng Bai
Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application.
no code implementations • 20 Apr 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Wei-Qiang Zhang
However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited.
1 code implementation • 31 Mar 2023 • Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu
Automatic detection of machine anomaly remains challenging for machine learning.
no code implementations • 14 Mar 2023 • Xuchu Chen, Yu Pu, Jinpeng Li, Wei-Qiang Zhang
We present our submission to the ICASSP-SPGC-2023 ADReSS-M Challenge Task, which aims to investigate which acoustic features can be generalized and transferred across languages for Alzheimer's Disease (AD) prediction.
no code implementations • 28 Jan 2023 • Yuzhen Qin, Li Sun, Hui Chen, Wei-Qiang Zhang, Wenming Yang, Jintao Fei, Guijin Wang
However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information.
1 code implementation • 5 Jan 2023 • Yutong Chen, Junhong Zhao, Wei-Qiang Zhang
It is in high demand to generate facial animation with high realism, but it remains a challenging task.
no code implementations • 2 Nov 2022 • Xing Chen, Jie Wang, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang
It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram.
no code implementations • 30 Oct 2022 • Jiadi Yao, Xing Chen, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang
Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge.
no code implementations • 27 Oct 2022 • Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 13 Oct 2022 • Haoyu Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan
Labeled audio data is insufficient to build satisfying speech recognition systems for most of the languages in the world.
no code implementations • 12 Oct 2022 • Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Jun 2022 • Jing Zhao, Haoyu Wang, Jinpeng Li, Shuzhou Chai, Guan-Bo Wang, Guoguo Chen, Wei-Qiang Zhang
For the Constrained training condition, we construct our basic ASR system based on the standard hybrid architecture.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 1 Mar 2022 • Yuting Nie, Junhong Zhao, Wei-Qiang Zhang, Jinfeng Bai
It has a profound impact on the multilingual interoperability of an intelligent speech system.
no code implementations • 27 Aug 2021 • Yuzi Yan, Wei-Qiang Zhang, Michael T. Johnson
As the cornerstone of other important technologies, such as speech recognition and speech synthesis, speech enhancement is a critical area in audio signal processing.
no code implementations • 6 Jul 2021 • Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu
While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.
1 code implementation • ACL 2021 • Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu
In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.
3 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on
Speech Recognition
on GigaSpeech
no code implementations • 25 Dec 2019 • Yi Liu, Tianyu Liang, Can Xu, Xianwei Zhang, Xianhong Chen, Wei-Qiang Zhang, Liang He, Dandan song, Ruyun Li, Yangcheng Wu, Peng Ouyang, Shouyi Yin
This paper describes the systems submitted by the department of electronic engineering, institute of microelectronics of Tsinghua university and TsingMicro Co. Ltd. (THUEE) to the NIST 2019 speaker recognition evaluation CTS challenge.
no code implementations • 3 Oct 2018 • Yu-Han Shen, Ke-Xin He, Wei-Qiang Zhang
To tackle this task, we propose a gated convolutional neural network with segment-level attention mechanism (SAM-GCNN).