Search Results for author: Hong-Goo Kang

Found 21 papers, 5 papers with code

BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0

no code implementations21 Dec 2023 Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2. 0 representations of the corresponding spoken speech.

Speech Synthesis Transfer Learning

Style Modeling for Multi-Speaker Articulation-to-Speech

no code implementations21 Dec 2023 Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation.

Self-supervised Complex Network for Machine Sound Anomaly Detection

no code implementations21 Dec 2023 Miseul Kim, Minh Tri Ho, Hong-Goo Kang

In this paper, we propose an anomaly detection algorithm for machine sounds with a deep complex network trained by self-supervision.

Anomaly Detection Time Series

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

no code implementations28 Aug 2023 Hyungchan Yoon, ChangHwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i. e., target speaker's speech).

Domain Generalization Zero-Shot Multi-Speaker TTS

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

1 code implementation16 Jun 2023 Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments.

Audio Signal Processing

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

no code implementations14 Jun 2023 Hejung Yang, Hong-Goo Kang

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data.

Self-Supervised Learning Speech Enhancement +2

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

no code implementations2 Jun 2023 Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments.

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

1 code implementation30 Jun 2022 Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences.

Keyword Spotting

ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

no code implementations9 May 2022 Sangshin Oh, Seyun Um, Hong-Goo Kang

In this work, we present a relaxed categorical analytic bound (ReCAB), a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution.

Speech Synthesis Text-To-Speech Synthesis +1

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

no code implementations24 Feb 2022 Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly.

Speech Enhancement

MIRNet: Learning multiple identities representations in overlapped speech

no code implementations4 Aug 2020 Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters.

Rgb-T Tracking Speaker Verification +1

FaceFilter: Audio-visual speech separation using still images

no code implementations14 May 2020 Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.

Speech Separation

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

1 code implementation31 Jan 2020 Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).

Quantization Speech Synthesis

Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

1 code implementation21 May 2019 Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang

In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method.

Speech Synthesis Text-To-Speech Synthesis

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

no code implementations9 Nov 2018 Eunwoo Song, Kyungguen Byun, Hong-Goo Kang

Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework.

Speech Synthesis

Speaker-adaptive neural vocoders for parametric speech synthesis systems

no code implementations8 Nov 2018 Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang

To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models.

Speech Synthesis

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

no code implementations21 Sep 2018 Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.

Binary Classification Cross-Modal Retrieval +4

Cannot find the paper you are looking for? You can Submit a new open access paper.