no code implementations • 3 Apr 2024 • Jaehyeon Kim, Keon Lee, Seungjun Chung, Jaewoong Cho
With the emergence of neural audio codecs, which encode multiple streams of discrete tokens from audio, large language models have recently gained attention as a promising approach for zero-shot Text-to-Speech (TTS) synthesis.
1 code implementation • 16 Jul 2022 • Sangyun Lee, Hyungjin Chung, Jaehyeon Kim, Jong Chul Ye
We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds.
10 code implementations • NeurIPS 2020) 2020 • Jungil Kong, Jaehyeon Kim, Jaekyoung Bae
Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms.
Ranked #10 on Speech Synthesis on LibriTTS
5 code implementations • NeurIPS 2020 • Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon
By leveraging the properties of flows, MAS searches for the most probable monotonic alignment between text and the latent representation of speech.
Ranked #4 on Text-To-Speech Synthesis on LJSpeech (using extra training data)