Search Results for author: Chenpeng Du

Found 13 papers, 2 papers with code

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

no code implementations9 Apr 2024 Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, HUI ZHANG, Xie Chen, Kai Yu

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

no code implementations25 Jan 2024 Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, HUI ZHANG, Xie Chen, Kai Yu

Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech prompt.

Hallucination

DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

no code implementations3 Nov 2023 Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu

Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.

Data Augmentation

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

1 code implementation14 Sep 2023 Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques.

Self-Supervised Learning speech-recognition +2

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

no code implementations10 Sep 2023 Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu

Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency.

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

no code implementations25 Jun 2023 Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i. e. speaker similarity) and eliminate the accents from their first language(i. e. nativeness).

Speech Synthesis

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

no code implementations14 Jun 2023 Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

no code implementations30 Mar 2023 Chenpeng Du, Qi Chen, Xie Chen, Kai Yu

Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly.

Talking Face Generation

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

no code implementations17 Nov 2022 Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and \textit{Neutral} is set to $\alpha$ and $1-\alpha$ respectively.

Denoising

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

no code implementations2 Apr 2022 Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu

The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(AM) that predicts acoustic feature from the input transcript and a vocoder that generates waveform according to the given acoustic feature.

Speech Synthesis Text-To-Speech Synthesis

Unsupervised word-level prosody tagging for controllable speech synthesis

no code implementations15 Feb 2022 Yiwei Guo, Chenpeng Du, Kai Yu

Although word-level prosody modeling in neural text-to-speech (TTS) has been investigated in recent research for diverse speech synthesis, it is still challenging to control speech synthesis manually without a specific reference.

Speech Synthesis

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

2 code implementations1 Feb 2021 Chenpeng Du, Kai Yu

Generating natural speech with diverse and smooth prosody pattern is a challenging task.

Speech Synthesis Text-To-Speech Synthesis Sound

Data Augmentation for End-to-end Code-switching Speech Recognition

no code implementations4 Nov 2020 Chenpeng Du, Hao Li, Yizhou Lu, Lan Wang, Yanmin Qian

Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Cannot find the paper you are looking for? You can Submit a new open access paper.