Search Results for author: Hong-Goo Kang

Found 21 papers, 5 papers with code

BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0

no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2. 0 representations of the corresponding spoken speech.

Speech Synthesis Transfer Learning

Paper
Add Code

Style Modeling for Multi-Speaker Articulation-to-Speech

no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation.

Paper
Add Code

Self-supervised Complex Network for Machine Sound Anomaly Detection

no code implementations • 21 Dec 2023 • Miseul Kim, Minh Tri Ho, Hong-Goo Kang

In this paper, we propose an anomaly detection algorithm for machine sounds with a deep complex network trained by self-supervision.

Anomaly Detection Time Series

Paper
Add Code

C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge

1 code implementation • 1 Nov 2023 • Woo-Jin Chung, Miseul Kim, Hong-Goo Kang

This report describes our submission to BHI 2023 Data Competition: Sensor challenge.

COVID-19 Diagnosis Data Augmentation

Paper
Code

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

no code implementations • 28 Aug 2023 • Hyungchan Yoon, ChangHwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i. e., target speaker's speech).

Domain Generalization Zero-Shot Multi-Speaker TTS

Paper
Add Code

Contrastive Learning based Deep Latent Masking for Music Source Separation

no code implementations • Interspeech 2023 • Jihyun Kim, Hong-Goo Kang

Recent studies on music source separation have extended their applicability to generic audio signals.

Ranked #13 on Music Source Separation on MUSDB18

Contrastive Learning Music Source Separation

Paper
Add Code

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

1 code implementation • 16 Jun 2023 • Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments.

Audio Signal Processing

Paper
Code

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

no code implementations • 14 Jun 2023 • Hejung Yang, Hong-Goo Kang

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data.

Self-Supervised Learning Speech Enhancement +2

Paper
Add Code

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

no code implementations • 2 Jun 2023 • Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments.

Paper
Add Code

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

1 code implementation • 30 Jun 2022 • Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences.

Keyword Spotting

Paper
Code

ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

no code implementations • 9 May 2022 • Sangshin Oh, Seyun Um, Hong-Goo Kang

In this work, we present a relaxed categorical analytic bound (ReCAB), a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution.

Speech Synthesis Text-To-Speech Synthesis +1

Paper
Add Code

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

no code implementations • 24 Feb 2022 • Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly.

Speech Enhancement

Paper
Add Code

Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

no code implementations • 26 Jul 2021 • Se-Yun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang

In this paper, we propose a multi-speaker face-to-speech waveform generation model that also works for unseen speaker conditions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation

no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn

In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.

Audio-Visual Synchronization Speech Separation

Paper
Add Code

MIRNet: Learning multiple identities representations in overlapped speech

no code implementations • 4 Aug 2020 • Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters.

Rgb-T Tracking Speaker Verification +1

Paper
Add Code

FaceFilter: Audio-visual speech separation using still images

no code implementations • 14 May 2020 • Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.

Speech Separation

Paper
Add Code

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

1 code implementation • 31 Jan 2020 • Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).

Quantization Speech Synthesis

296

Paper
Code

Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

1 code implementation • 21 May 2019 • Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang

In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method.

Speech Synthesis Text-To-Speech Synthesis

Paper
Code

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

no code implementations • 9 Nov 2018 • Eunwoo Song, Kyungguen Byun, Hong-Goo Kang

Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework.

Speech Synthesis

Paper
Add Code

Speaker-adaptive neural vocoders for parametric speech synthesis systems

no code implementations • 8 Nov 2018 • Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang

To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models.

Speech Synthesis

Paper
Add Code

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

no code implementations • 21 Sep 2018 • Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.

Binary Classification Cross-Modal Retrieval +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.