no code implementations • 7 Aug 2024 • Youkyum Kim, Jaemin Jung, Jihwan Park, Byeong-Yeol Kim, Joon Son Chung
This paper proposes a novel user-defined keyword spotting framework that accurately detects audio keywords based on text enrollment.
no code implementations • CVPR 2024 • Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung
The goal of this work is to simultaneously generate natural talking faces and speech outputs from text.
no code implementations • 23 Jan 2024 • Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe
We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers.
Ranked #1 on Speech Separation on WSJ0-5mix
no code implementations • 18 Apr 2023 • Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain.
no code implementations • 6 Apr 2023 • Youngjoon Jang, Kyeongha Rho, Jong-Bin Woo, Hyeongkeun Lee, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Joon Son Chung
The goal of this paper is to synthesise talking faces with controllable facial motions.
no code implementations • 29 Mar 2023 • Jinseok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim
Language identification (LID) recognizes the language of a spoken utterance automatically.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 28 Feb 2023 • Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim
In this paper, we propose CrossSpeech which improves the quality of cross-lingual speech by effectively disentangling speaker and language information in the level of acoustic feature space.
no code implementations • 1 Nov 2022 • Jaemin Jung, Youkyum Kim, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Youngjoon Jang, Joon Son Chung
In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics.