no code implementations • 9 Jan 2025 • Jun-Hyeok Cha, Seung-bin Kim, Hyung-Seok Oh, Seong-Whan Lee
To address this, we introduce JELLY, a novel CSS framework that integrates emotion recognition and context reasoning for generating appropriate speech in conversation by fine-tuning a large language model (LLM) with multiple partial LoRA modules.
no code implementations • 9 Jan 2025 • Jun-Hak Yun, Seung-bin Kim, Seong-Whan Lee
Audio super-resolution is challenging owing to its ill-posed nature.
1 code implementation • 4 Nov 2024 • Deok-Hyeon Cho, Hyung-Seok Oh, Seung-bin Kim, Seong-Whan Lee
Emotional text-to-speech (TTS) technology has achieved significant progress in recent years; however, challenges remain owing to the inherent complexity of emotions and limitations of the available emotional speech datasets and models.
1 code implementation • 12 Jun 2024 • Deok-Hyeon Cho, Hyung-Seok Oh, Seung-bin Kim, Sang-Hoon Lee, Seong-Whan Lee
Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion.
1 code implementation • 11 Jun 2024 • Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu
In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers.
no code implementations • 17 Jan 2024 • Seung-bin Kim, Sang-Hoon Lee, Seong-Whan Lee
With this method, despite training exclusively on the target language's monolingual data, we can generate target language speech in the inference stage using language-agnostic speech embedding from the source language speech.
2 code implementations • 21 Nov 2023 • Sang-Hoon Lee, Ha-Yeong Choi, Seung-bin Kim, Seong-Whan Lee
Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.
no code implementations • 10 Jun 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu
In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach.
1 code implementation • 7 May 2020 • Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding.
2 code implementations • 1 Apr 2020 • Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms.