no code implementations • 4 Sep 2024 • Jeongmin Liu, Eunwoo Song
While universal vocoders have achieved proficient waveform generation across diverse voices, their integration into text-to-speech (TTS) tasks often results in degraded synthetic quality.
no code implementations • 8 Feb 2024 • Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo
Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 28 Aug 2023 • Hyungchan Yoon, ChangHwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i. e., target speaker's speech).
no code implementations • 28 Oct 2022 • Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.
no code implementations • 30 Jun 2022 • Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim
In the proposed method, we first adopt a variational autoencoder whose posterior distribution is utilized to extract latent features representing acoustic similarity between the recorded and synthetic corpora.
no code implementations • 21 Apr 2022 • Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana
Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1, 000 utterances of the target speaker's neutral data are available.
no code implementations • 27 Oct 2020 • Ryuichi Yamamoto, Eunwoo Song, Min-Jae Hwang, Jae-Min Kim
This paper proposes voicing-aware conditional discriminators for Parallel WaveGAN-based waveform synthesis systems.
1 code implementation • 31 Jan 2020 • Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).
12 code implementations • 25 Oct 2019 • Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.
1 code implementation • 21 May 2019 • Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang
In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method.
1 code implementation • 9 Apr 2019 • Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
As this process encourages the student to model the distribution of realistic speech waveform, the perceptual quality of the synthesized speech becomes much more natural.
no code implementations • 9 Nov 2018 • Eunwoo Song, Kyungguen Byun, Hong-Goo Kang
Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework.
no code implementations • 8 Nov 2018 • Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang
To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models.