Search Results for author: Wooseok Han

Found 2 papers, 0 papers with code

Face-StyleSpeech: Improved Face-to-Voice latent mapping for Natural Zero-shot Speech Synthesis from a Face Image

no code implementations25 Sep 2023 Minki Kang, Wooseok Han, Eunho Yang

The prosody encoder is specifically designed to model prosodic features that are not captured only with a face image, allowing the face encoder to focus solely on capturing the speaker identity from the face image.

Speech Synthesis Text to Speech

ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

no code implementations23 May 2023 Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

Emotional Text-To-Speech (TTS) is an important task in the development of systems (e. g., human-like dialogue agents) that require natural and emotional speech.

Speech Synthesis Text to Speech +1

Cannot find the paper you are looking for? You can Submit a new open access paper.