Search Results for author: Hong-Goo Kang

Found 25 papers, 6 papers with code

Optimization of DNN-based speaker verification model through efficient quantization technique

no code implementations12 Jul 2024 Yeona Hong, Woo-Jin Chung, Hong-Goo Kang

By analyzing performance changes and model size reductions in each layer of a pre-trained speaker verification model, we have effectively minimized performance degradation while significantly reducing the model size.

Quantization Speaker Verification

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

1 code implementation25 Jun 2024 Woo-Jin Chung, Hong-Goo Kang

We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets.

Self-Supervised Learning

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

no code implementations18 Jun 2024 Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi

This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments.

Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments

no code implementations14 Jun 2024 Jihyun Kim, Stijn Kindt, Nilesh Madhu, Hong-Goo Kang

Ad-hoc distributed microphone environments, where microphone locations and numbers are unpredictable, present a challenge to traditional deep learning models, which typically require fixed architectures.

Deep Learning Speech Separation

Style Modeling for Multi-Speaker Articulation-to-Speech

no code implementations21 Dec 2023 Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation.

Self-supervised Complex Network for Machine Sound Anomaly Detection

no code implementations21 Dec 2023 Miseul Kim, Minh Tri Ho, Hong-Goo Kang

In this paper, we propose an anomaly detection algorithm for machine sounds with a deep complex network trained by self-supervision.

Anomaly Detection Time Series

BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0

no code implementations21 Dec 2023 Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2. 0 representations of the corresponding spoken speech.

Speech Synthesis Transfer Learning

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

no code implementations28 Aug 2023 Hyungchan Yoon, ChangHwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i. e., target speaker's speech).

Domain Generalization Text to Speech +1

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

1 code implementation16 Jun 2023 Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments.

Audio Signal Processing

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

no code implementations14 Jun 2023 Hejung Yang, Hong-Goo Kang

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data.

Self-Supervised Learning Speech Enhancement +2

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

no code implementations2 Jun 2023 Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments.

Decoder

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

1 code implementation30 Jun 2022 Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences.

Keyword Spotting

ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

no code implementations9 May 2022 Sangshin Oh, Seyun Um, Hong-Goo Kang

In this work, we present a relaxed categorical analytic bound (ReCAB), a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution.

Speech Synthesis Text to Speech +2

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

no code implementations24 Feb 2022 Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly.

Speech Enhancement

MIRNet: Learning multiple identities representations in overlapped speech

no code implementations4 Aug 2020 Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters.

Rgb-T Tracking Speaker Verification +1

FaceFilter: Audio-visual speech separation using still images

no code implementations14 May 2020 Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.

Speech Separation

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

1 code implementation31 Jan 2020 Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).

Quantization Speech Synthesis +1

Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

1 code implementation21 May 2019 Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang

In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method.

Speech Synthesis Text to Speech +1

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

no code implementations9 Nov 2018 Eunwoo Song, Kyungguen Byun, Hong-Goo Kang

Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework.

Speech Synthesis

Speaker-adaptive neural vocoders for parametric speech synthesis systems

no code implementations8 Nov 2018 Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang

To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models.

Speech Synthesis Text to Speech

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

no code implementations21 Sep 2018 Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.

Binary Classification Cross-Modal Retrieval +4

Cannot find the paper you are looking for? You can Submit a new open access paper.