Search Results for author: Zhengqi Wen

Found 28 papers, 3 papers with code

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

no code implementations • 2 Mar 2023 • Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv

To address this problem, we propose using the deepest network instruct shallow network for enhancing shallow networks.

Paper
Add Code

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

no code implementations • 10 Jan 2023 • Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, JianHua Tao

Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality.

Quantization Voice Conversion

Paper
Add Code

Emotion Selectable End-to-End Text-based Speech Editing

no code implementations • 20 Dec 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen, Chu Yuan Zhang

To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech.

Data Augmentation

Paper
Add Code

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

no code implementations • 2 Aug 2022 • Jun Xue, Cunhang Fan, Zhao Lv, JianHua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately.

DeepFake Detection Face Swapping

Paper
Add Code

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

no code implementations • 5 Mar 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen

We have also verified through experiments that this method can effectively control the noise components in the predicted speech and adjust the SNR of speech.

Paper
Add Code

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

1 code implementation • 21 Feb 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen

It can solve unnatural prosody in the edited region and synthesize the speech corresponding to the unseen words in the transcript.

Few-Shot Learning Sentence

163

Paper
Code

ADD 2022: the First Audio Deep Synthesis Detection Challenge

no code implementations • 17 Feb 2022 • Jiangyan Yi, Ruibo Fu, JianHua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.

Audio Generation DeepFake Detection +1

Paper
Add Code

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

no code implementations • 16 Feb 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen

Firstly, we propose a global duration control attention mechanism for the SVS model.

Singing Voice Synthesis

Paper
Add Code

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

no code implementations • 7 Apr 2021 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence.

Position speech-recognition +1

Paper
Add Code

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

1 code implementation • 4 Apr 2021 • Zhengkun Tian, Jiangyan Yi, JianHua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu

To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model.

speech-recognition Speech Recognition +1

Paper
Code

Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT

no code implementations • 15 Feb 2021 • Ye Bai, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Based on this idea, we propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once).

Language Modelling Position +3

Paper
Add Code

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

no code implementations • 11 Nov 2020 • Cunhang Fan, Bin Liu, JianHua Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song

This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.

Speech Enhancement

Paper
Add Code

Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

no code implementations • 9 Nov 2020 • Cunhang Fan, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen

The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition

no code implementations • 28 Oct 2020 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder and the two-stage inference method into the streaming CTC model.

Language Modelling speech-recognition +1

Paper
Add Code

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

no code implementations • 28 Oct 2020 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, JianHua Tao, Zhengqi Wen

In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

no code implementations • 16 May 2020 • Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen

To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence.

Machine Translation speech-recognition +2

Paper
Add Code

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

no code implementations • 11 May 2020 • Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Without beam-search, the one-pass propagation much reduces inference time cost of LASO.

Sentence speech-recognition +1

Paper
Add Code

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

no code implementations • 6 Apr 2020 • Cunhang Fan, Jian-Hua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen

In this paper, we propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features, which is based on the deep clustering (DC).

Clustering Deep Clustering +4

Paper
Add Code

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

no code implementations • 17 Mar 2020 • Cunhang Fan, Jian-Hua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen, Xuefei Liu

Secondly, to pay more attention to the outputs of the pre-separation stage, an attention module is applied to acquire deep attention fusion features, which are extracted by computing the similarity between the mixture and the pre-separated speech.

Deep Attention Speech Separation

Paper
Add Code

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

no code implementations • 5 Feb 2020 • Cunhang Fan, Bin Liu, Jian-Hua Tao, Jiangyan Yi, Zhengqi Wen

Specifically, we apply the deep clustering network to extract deep embedding features.

Clustering Deep Attention +2

Paper
Add Code

Synchronous Transformers for End-to-End Speech Recognition

no code implementations • 6 Dec 2019 • Zhengkun Tian, Jiangyan Yi, Ye Bai, Jian-Hua Tao, Shuai Zhang, Zhengqi Wen

Once a fixed-length chunk of the input sequence is processed by the encoder, the decoder begins to predict symbols immediately.

speech-recognition Speech Recognition

Paper
Add Code

Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data

no code implementations • 4 Dec 2019 • Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang

To alleviate the above two issues, we propose a unified method called LST (Learn Spelling from Teachers) to integrate knowledge into an AED model from the external text-only data and leverage the whole context in a sentence.

Language Modelling Sentence +2

Paper
Add Code

Self-Attention Transducers for End-to-End Speech Recognition

no code implementations • 28 Sep 2019 • Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Zhengqi Wen

Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance.

speech-recognition Speech Recognition

Paper
Add Code

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

no code implementations • 23 Jul 2019 • Cunhang Fan, Bin Liu, Jian-Hua Tao, Jiangyan Yi, Zhengqi Wen

Firstly, a DC network is trained to extract deep embedding features, which contain each source's information and have an advantage in discriminating each target speakers.

Clustering Deep Clustering +1

Paper
Add Code

Forward-Backward Decoding for Regularizing End-to-End TTS

1 code implementation • 18 Jul 2019 • Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jian-Hua Tao

Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0. 14 in a challenging test, and achieving close to human quality (4. 42 vs. 4. 49 in MOS) on general test.

28,936

Paper
Code