no code implementations • 29 Jan 2025 • Ha-Yeong Choi, JaeHan Park
VoicePrompter is composed of (1) a factorization method that disentangles speech components and (2) a DiT-based conditional flow matching (CFM) decoder that conditions on these factorized features and voice prompts.
1 code implementation • 15 Aug 2024 • Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee
This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization.
Ranked #1 on
Speech Synthesis
on LibriTTS
1 code implementation • 14 Aug 2024 • Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee
Additionally, we utilize a multi-period estimator that avoids overlaps to capture different periodic features of waveform signals.
Ranked #4 on
Speech Synthesis
on LibriTTS
2 code implementations • 21 Nov 2023 • Sang-Hoon Lee, Ha-Yeong Choi, Seung-bin Kim, Seong-Whan Lee
Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.
1 code implementation • 8 Nov 2023 • Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee
Finally, by using the masked prior in diffusion models, our model can improve the speaker adaptation quality.
no code implementations • 30 Jul 2023 • Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee
With a hierarchical adaptive structure, the model can adapt to a novel voice style and convert speech progressively.
1 code implementation • 25 May 2023 • Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee
To address the above problem, this paper presents decoupled denoising diffusion models (DDDMs) with disentangled representations, which can control the style for each attribute in generative models.