1 code implementation • 15 Aug 2024 • Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee
This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization.
Ranked #1 on
Speech Synthesis
on LibriTTS
1 code implementation • 14 Aug 2024 • Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee
Additionally, we utilize a multi-period estimator that avoids overlaps to capture different periodic features of waveform signals.
Ranked #4 on
Speech Synthesis
on LibriTTS
1 code implementation • 12 Jun 2024 • Deok-Hyeon Cho, Hyung-Seok Oh, Seung-bin Kim, Sang-Hoon Lee, Seong-Whan Lee
Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion.
no code implementations • 17 Jan 2024 • Seung-bin Kim, Sang-Hoon Lee, Seong-Whan Lee
With this method, despite training exclusively on the target language's monolingual data, we can generate target language speech in the inference stage using language-agnostic speech embedding from the source language speech.
1 code implementation • 16 Jan 2024 • Hyung-Seok Oh, Sang-Hoon Lee, Deok-Hyeon Cho, Seong-Whan Lee
Emotional voice conversion (EVC) involves modifying various acoustic characteristics, such as pitch and spectral envelope, to match a desired emotional state while preserving the speaker's identity.
2 code implementations • 21 Nov 2023 • Sang-Hoon Lee, Ha-Yeong Choi, Seung-bin Kim, Seong-Whan Lee
Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.
1 code implementation • 8 Nov 2023 • Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee
Finally, by using the masked prior in diffusion models, our model can improve the speaker adaptation quality.
1 code implementation • 31 Jul 2023 • Hyung-Seok Oh, Sang-Hoon Lee, Seong-Whan Lee
Expressive text-to-speech systems have undergone significant advancements owing to prosody modeling, but conventional methods can still be improved.
no code implementations • 30 Jul 2023 • Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee
With a hierarchical adaptive structure, the model can adapt to a novel voice style and convert speech progressively.
no code implementations • 13 Jun 2023 • Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee
Furthermore, we introduce a pause-based word encoder to model word-level prosody based on pause sequence.
no code implementations • 12 Jun 2023 • Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee
To alleviate the challenges posed by model complexity in singing voice synthesis, we propose HiddenSinger, a high-quality singing voice synthesis system using a neural audio codec and latent diffusion models.
1 code implementation • 25 May 2023 • Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee
To address the above problem, this paper presents decoupled denoising diffusion models (DDDMs) with disentangled representations, which can control the style for each attribute in generative models.
no code implementations • NeurIPS 2021 • Sang-Hoon Lee, Ji-Hoon Kim, Hyunseung Chung, Seong-Whan Lee
This insufficiency leads to the converted speech containing source speech style or losing source speech content.
no code implementations • 16 Aug 2021 • Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, Hong-Gyu Jung, Seong-Whan Lee
While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data.
no code implementations • 5 Jun 2021 • Hyunseung Chung, Sang-Hoon Lee, Seong-Whan Lee
Experimental results also show the superiority of our proposed model compared to other state-of-the-art TTS models with internal and external aligners.
2 code implementations • 4 Jun 2021 • Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, Seong-Whan Lee
Although recent works on neural vocoder have improved the quality of synthesized audio, there still exists a gap between generated and ground-truth audio in frequency space.
no code implementations • 16 Aug 2020 • Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, Seong-Whan Lee
In recent works, a flow-based neural vocoder has shown significant improvement in real-time speech generation task.
1 code implementation • ICCV 2019 • Anh-Duc Nguyen, Seonghwa Choi, Woojae Kim, Sang-Hoon Lee
In this paper, we present a novel deep method to reconstruct a point cloud of an object from a single still image.
3 code implementations • 7 Aug 2019 • Raehyun Kim, Chan Ho So, Minbyul Jeong, Sang-Hoon Lee, Jinkyu Kim, Jaewoo Kang
Methods that use relational data for stock market prediction have been recently proposed, but they are still in their infancy.
no code implementations • ECCV 2018 • Woojae Kim, Jongyoo Kim, Sewoong Ahn, Jinwoo Kim, Sang-Hoon Lee
Incorporating spatio-temporal human visual perception into video quality assessment (VQA) remains a formidable issue.
no code implementations • ECCV 2018 • Kyoungoh Lee, Inwoong Lee, Sang-Hoon Lee
We present a novel 3D pose estimation method based on joint interdependency (JI) for acquiring 3D joints from the human pose of an RGB image.
Ranked #222 on
3D Human Pose Estimation
on Human3.6M
no code implementations • 4 Jul 2018 • Anh-Duc Nguyen, Woojae Kim, Jongyoo Kim, Sang-Hoon Lee
We propose a generative framework which takes on the video frame interpolation problem.
1 code implementation • ICCV 2017 • Inwoong Lee, Doyoung Kim, Seoungyoon Kang, Sang-Hoon Lee
In our network, we utilize an average ensemble among multiple parts as a final feature to capture various temporal dependencies.
Ranked #120 on
Skeleton Based Action Recognition
on NTU RGB+D
1 code implementation • CVPR 2017 • Jongyoo Kim, Sang-Hoon Lee
Since human observers are the ultimate receivers of digital images, image quality metrics should be designed from a human-oriented perspective.
Full reference image quality assessment
Full-Reference Image Quality Assessment