no code implementations • 31 Aug 2023 • Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng
This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio.
1 code implementation • 27 Oct 2022 • Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng
Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages.
1 code implementation • 22 Sep 2022 • Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng
A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling progressively in multiple stages into MSMC Representations (MSMCRs) with different time resolutions, and quantizing them with multiple VQ codebooks, respectively.
no code implementations • 28 Sep 2021 • Shilun Lin, Wenchao Su, Li Meng, Fenglong Xie, Xinhui Li, Li Lu
Thirdly, a duration predictor instead of an attention model that connects the above hybrid encoder and decoder.
no code implementations • 30 Jan 2021 • Shilun Lin, Fenglong Xie, Li Meng, Xinhui Li, Li Lu
In this work, a robust and efficient text-to-speech (TTS) synthesis system named Triple M is proposed for large-scale online application.