no code implementations • 15 Sep 2023 • Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions.
1 code implementation • 28 Oct 2022 • Masaya Kawamura, Yuma Shirahata, Ryuichi Yamamoto, Kentaro Tachibana
We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform.
no code implementations • 28 Oct 2022 • Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.
no code implementations • 21 Apr 2022 • Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana
Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1, 000 utterances of the target speaker's neutral data are available.