no code implementations • 8 Feb 2024 • Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo
While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 14 Mar 2023 • Chaehun Shin, Heeseung Kim, Che Hyun Lee, Sang-gil Lee, Sungroh Yoon
Despite the fact that text-to-video (TTV) model has recently achieved remarkable success, there have been few approaches on TTV for its extension to video editing.
no code implementations • 30 May 2022 • Sungwon Kim, Heeseung Kim, Sungroh Yoon
We train the speaker-conditional diffusion model on large-scale untranscribed datasets for a classifier-free guidance method and further fine-tune the diffusion model on the reference speech of the target speaker for adaptation, which only takes 40 seconds.
no code implementations • 23 Nov 2021 • Heeseung Kim, Sungwon Kim, Sungroh Yoon
For TTS synthesis, we guide the generative process of the diffusion model with a phoneme classifier trained on a large-scale speech recognition dataset.
no code implementations • 29 Sep 2021 • Heeseung Kim, Sungwon Kim, Sungroh Yoon
By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training.
no code implementations • ACL 2022 • Sangwon Yu, Jongyoon Song, Heeseung Kim, Seong-min Lee, Woo-Jong Ryu, Sungroh Yoon
AGG addresses the degeneration problem by gating the specific part of the gradient for rare token embeddings.
1 code implementation • ICLR 2022 • Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu
Denoising diffusion probabilistic models have been recently proposed to generate high-quality samples by estimating the gradient of the data density.
1 code implementation • ICLR 2022 • Uiwon Hwang, Heeseung Kim, Dahuin Jung, Hyemi Jang, Hyungyu Lee, Sungroh Yoon
Generative adversarial networks (GANs) with clustered latent spaces can perform conditional generation in a completely unsupervised manner.