Search Results for author: Wei-Qiang Zhang

Found 22 papers, 7 papers with code

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

no code implementations6 Oct 2023 Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang

Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD.

Alzheimer's Disease Detection Depression Detection +1

DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model

1 code implementation2 Jun 2023 Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Jinfeng Bai

Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application.

speech-recognition Speech Recognition

Task-Agnostic Structured Pruning of Speech Representation Models

no code implementations2 Jun 2023 Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan

Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks.

Model Compression

Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning

no code implementations20 Apr 2023 Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Wei-Qiang Zhang

However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited.

Contrastive Learning Machine Translation +3

Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features

no code implementations14 Mar 2023 Xuchu Chen, Yu Pu, Jinpeng Li, Wei-Qiang Zhang

We present our submission to the ICASSP-SPGC-2023 ADReSS-M Challenge Task, which aims to investigate which acoustic features can be generalized and transferred across languages for Alzheimer's Disease (AD) prediction.

Alzheimer's Disease Detection

MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring

no code implementations28 Jan 2023 Yuzhen Qin, Li Sun, Hui Chen, Wei-Qiang Zhang, Wenming Yang, Jintao Fei, Guijin Wang

However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information.

ECG Classification Knowledge Distillation

Expressive Speech-driven Facial Animation with controllable emotions

1 code implementation5 Jan 2023 Yutong Chen, Junhong Zhao, Wei-Qiang Zhang

It is in high demand to generate facial animation with high realism, but it remains a challenging task.

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification

no code implementations2 Nov 2022 Xing Chen, Jie Wang, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang

It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram.

Speaker Verification

Symmetric Saliency-based Adversarial Attack To Speaker Identification

no code implementations30 Oct 2022 Jiadi Yao, Xing Chen, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang

Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge.

Adversarial Attack Speaker Identification

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

no code implementations12 Oct 2022 Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement

no code implementations27 Aug 2021 Yuzi Yan, Wei-Qiang Zhang, Michael T. Johnson

As the cornerstone of other important technologies, such as speech recognition and speech synthesis, speech enhancement is a critical area in audio signal processing.

Audio Signal Processing Speech Enhancement +3

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations6 Jul 2021 Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling

1 code implementation ACL 2021 Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu

In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.

Language Modelling

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

2 code implementations13 Jun 2021 Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.

Sentence speech-recognition +1

THUEE system description for NIST 2019 SRE CTS Challenge

no code implementations25 Dec 2019 Yi Liu, Tianyu Liang, Can Xu, Xianwei Zhang, Xianhong Chen, Wei-Qiang Zhang, Liang He, Dandan song, Ruyun Li, Yangcheng Wu, Peng Ouyang, Shouyi Yin

This paper describes the systems submitted by the department of electronic engineering, institute of microelectronics of Tsinghua university and TsingMicro Co. Ltd. (THUEE) to the NIST 2019 speaker recognition evaluation CTS challenge.

Speaker Recognition

SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring

no code implementations3 Oct 2018 Yu-Han Shen, Ke-Xin He, Wei-Qiang Zhang

To tackle this task, we propose a gated convolutional neural network with segment-level attention mechanism (SAM-GCNN).

Home Activity Monitoring

Cannot find the paper you are looking for? You can Submit a new open access paper.