no code implementations • 27 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang
To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models.
no code implementations • 1 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang
However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.
no code implementations • 28 Jul 2023 • Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang
However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.
no code implementations • 14 Mar 2023 • Chunyu Qiang, Peng Yang, Hao Che, Ying Zhang, Xiaorui Wang, Zhongyuan Wang
Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesized speech of a target speaker's timbre.
no code implementations • 13 Dec 2022 • Chunyu Qiang, Peng Yang, Hao Che, Xiaorui Wang, Zhongyuan Wang
In order to improve the style extraction ability of the reference encoder, a style invariant and contrastive data augmentation method is proposed.
no code implementations • 17 Nov 2022 • Chunyu Qiang, Peng Yang, Hao Che, Jinba Xiao, Xiaorui Wang, Zhongyuan Wang
In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data.