no code implementations • 16 Oct 2024 • Jianwei Cui, Yu Gu, Chao Weng, Jie Zhang, Liping Chen, LiRong Dai
This paper presents an advanced end-to-end singing voice synthesis (SVS) system based on the source-filter mechanism that directly translates lyrical and melodic cues into expressive and high-fidelity human-like singing.
no code implementations • 12 Jun 2024 • Rui Wang, Liping Chen, Kong Aik Lee, Zhen-Hua Ling
Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception.
no code implementations • 12 Jun 2024 • Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Liping Chen, Jie Zhang, ZhenHua Ling
It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity.
no code implementations • 8 Jun 2024 • Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Liping Chen, LiRong Dai
We pretrain a variational autoencoder structure using the noted open-source So-VITS-SVC project based on the VITS framework, which is then used for the LDM training.
no code implementations • 22 Jan 2024 • Shihao Chen, Liping Chen, Jie Zhang, KongAik Lee, ZhenHua Ling, LiRong Dai
For validation, we employ the open-source pre-trained YourTTS model for speech generation and protect the target speaker's speech in the white-box scenario.
1 code implementation • 31 Oct 2022 • Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei
However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
no code implementations • 8 Jun 2021 • Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He
Experimental results obtained by the Transformer TTS show that the proposed BERT can extract fine-grained, segment-level prosody, which is complementary to utterance-level prosody to improve the final prosody of the TTS speech.
no code implementations • 1 Mar 2016 • Ruizhi Liao, Cristian Roman, Peter Ball, Shumao Ou, Liping Chen
As the number of vehicles continues to grow, parking spaces are at a premium in city streets.