1 code implementation • 13 Mar 2024 • Jiayu Du, Jinpeng Li, Guoguo Chen, Wei-Qiang Zhang
In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Oct 2023 • Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang
Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD.
1 code implementation • 2 Jun 2023 • Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Jinfeng Bai
Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application.
no code implementations • 2 Jun 2023 • Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan
Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks.
no code implementations • 20 Apr 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Wei-Qiang Zhang
However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited.
1 code implementation • 31 Mar 2023 • Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu
Automatic detection of machine anomaly remains challenging for machine learning.
no code implementations • 14 Mar 2023 • Xuchu Chen, Yu Pu, Jinpeng Li, Wei-Qiang Zhang
We present our submission to the ICASSP-SPGC-2023 ADReSS-M Challenge Task, which aims to investigate which acoustic features can be generalized and transferred across languages for Alzheimer's Disease (AD) prediction.
no code implementations • 28 Jan 2023 • Yuzhen Qin, Li Sun, Hui Chen, Wei-Qiang Zhang, Wenming Yang, Jintao Fei, Guijin Wang
However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information.
1 code implementation • 5 Jan 2023 • Yutong Chen, Junhong Zhao, Wei-Qiang Zhang
It is in high demand to generate facial animation with high realism, but it remains a challenging task.
no code implementations • 2 Nov 2022 • Xing Chen, Jie Wang, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang
It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram.
no code implementations • 30 Oct 2022 • Jiadi Yao, Xing Chen, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang
Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge.
no code implementations • 27 Oct 2022 • Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Oct 2022 • Haoyu Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan
Labeled audio data is insufficient to build satisfying speech recognition systems for most of the languages in the world.
no code implementations • 12 Oct 2022 • Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 29 Jun 2022 • Jing Zhao, Haoyu Wang, Jinpeng Li, Shuzhou Chai, Guan-Bo Wang, Guoguo Chen, Wei-Qiang Zhang
For the Constrained training condition, we construct our basic ASR system based on the standard hybrid architecture.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 1 Mar 2022 • Yuting Nie, Junhong Zhao, Wei-Qiang Zhang, Jinfeng Bai
It has a profound impact on the multilingual interoperability of an intelligent speech system.
no code implementations • 27 Aug 2021 • Yuzi Yan, Wei-Qiang Zhang, Michael T. Johnson
As the cornerstone of other important technologies, such as speech recognition and speech synthesis, speech enhancement is a critical area in audio signal processing.
no code implementations • 6 Jul 2021 • Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu
While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.
1 code implementation • ACL 2021 • Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu
In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.
2 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on Speech Recognition on GigaSpeech
no code implementations • 25 Dec 2019 • Yi Liu, Tianyu Liang, Can Xu, Xianwei Zhang, Xianhong Chen, Wei-Qiang Zhang, Liang He, Dandan song, Ruyun Li, Yangcheng Wu, Peng Ouyang, Shouyi Yin
This paper describes the systems submitted by the department of electronic engineering, institute of microelectronics of Tsinghua university and TsingMicro Co. Ltd. (THUEE) to the NIST 2019 speaker recognition evaluation CTS challenge.
no code implementations • 3 Oct 2018 • Yu-Han Shen, Ke-Xin He, Wei-Qiang Zhang
To tackle this task, we propose a gated convolutional neural network with segment-level attention mechanism (SAM-GCNN).