no code implementations • 12 Jul 2024 • Yeona Hong, Woo-Jin Chung, Hong-Goo Kang
By analyzing performance changes and model size reductions in each layer of a pre-trained speaker verification model, we have effectively minimized performance degradation while significantly reducing the model size.
1 code implementation • 25 Jun 2024 • Woo-Jin Chung, Hong-Goo Kang
We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets.
no code implementations • 18 Jun 2024 • Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments.
no code implementations • 14 Jun 2024 • Jihyun Kim, Stijn Kindt, Nilesh Madhu, Hong-Goo Kang
Ad-hoc distributed microphone environments, where microphone locations and numbers are unpredictable, present a challenge to traditional deep learning models, which typically require fixed architectures.
no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang
In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation.
no code implementations • 21 Dec 2023 • Miseul Kim, Minh Tri Ho, Hong-Goo Kang
In this paper, we propose an anomaly detection algorithm for machine sounds with a deep complex network trained by self-supervision.
no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang
Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2. 0 representations of the corresponding spoken speech.
1 code implementation • 1 Nov 2023 • Woo-Jin Chung, Miseul Kim, Hong-Goo Kang
This report describes our submission to BHI 2023 Data Competition: Sensor challenge.
no code implementations • 28 Aug 2023 • Hyungchan Yoon, ChangHwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i. e., target speaker's speech).
no code implementations • Interspeech 2023 • Jihyun Kim, Hong-Goo Kang
Recent studies on music source separation have extended their applicability to generic audio signals.
Ranked #13 on Music Source Separation on MUSDB18
1 code implementation • 16 Jun 2023 • Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang
We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments.
no code implementations • 14 Jun 2023 • Hejung Yang, Hong-Goo Kang
Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data.
no code implementations • 2 Jun 2023 • Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang
This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments.
1 code implementation • 30 Jun 2022 • Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang
In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences.
no code implementations • 9 May 2022 • Sangshin Oh, Seyun Um, Hong-Goo Kang
In this work, we present a relaxed categorical analytic bound (ReCAB), a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution.
no code implementations • 24 Feb 2022 • Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang
Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly.
no code implementations • 26 Jul 2021 • Se-Yun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang
In this paper, we propose a multi-speaker face-to-speech waveform generation model that also works for unseen speaker conditions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn
In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.
no code implementations • 4 Aug 2020 • Hyewon Han, Soo-Whan Chung, Hong-Goo Kang
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters.
no code implementations • 14 May 2020 • Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang
The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.
1 code implementation • 31 Jan 2020 • Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).
1 code implementation • 21 May 2019 • Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang
In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method.
no code implementations • 9 Nov 2018 • Eunwoo Song, Kyungguen Byun, Hong-Goo Kang
Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework.
no code implementations • 8 Nov 2018 • Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang
To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models.
no code implementations • 21 Sep 2018 • Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang
This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.