no code implementations • 25 Sep 2023 • Minki Kang, Wooseok Han, Eunho Yang
The prosody encoder is specifically designed to model prosodic features that are not captured only with a face image, allowing the face encoder to focus solely on capturing the speaker identity from the face image.
no code implementations • 23 May 2023 • Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang
Emotional Text-To-Speech (TTS) is an important task in the development of systems (e. g., human-like dialogue agents) that require natural and emotional speech.