Search Results for author: Eunwoo Song

Found 13 papers, 4 papers with code

Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems

no code implementations4 Sep 2024 Jeongmin Liu, Eunwoo Song

While universal vocoders have achieved proficient waveform generation across diverse voices, their integration into text-to-speech (TTS) tasks often results in degraded synthetic quality.

Text to Speech

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

no code implementations28 Aug 2023 Hyungchan Yoon, ChangHwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i. e., target speaker's speech).

Domain Generalization Text to Speech +1

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

no code implementations28 Oct 2022 Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana

From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.

Decoder Diversity +3

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

no code implementations30 Jun 2022 Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim

In the proposed method, we first adopt a variational autoencoder whose posterior distribution is utilized to extract latent features representing acoustic similarity between the recorded and synthetic corpora.

Speech Synthesis Text to Speech

Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

no code implementations21 Apr 2022 Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana

Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1, 000 utterances of the target speaker's neutral data are available.

Data Augmentation Text to Speech +1

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

1 code implementation31 Jan 2020 Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).

Quantization Speech Synthesis +1

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

12 code implementations25 Oct 2019 Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

Generative Adversarial Network Speech Synthesis +2

Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

1 code implementation21 May 2019 Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang

In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method.

Speech Synthesis Text to Speech +1

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

1 code implementation9 Apr 2019 Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

As this process encourages the student to model the distribution of realistic speech waveform, the perceptual quality of the synthesized speech becomes much more natural.

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

no code implementations9 Nov 2018 Eunwoo Song, Kyungguen Byun, Hong-Goo Kang

Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework.

Speech Synthesis

Speaker-adaptive neural vocoders for parametric speech synthesis systems

no code implementations8 Nov 2018 Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang

To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models.

Speech Synthesis Text to Speech

Cannot find the paper you are looking for? You can Submit a new open access paper.