Search Results for author: Soo-Whan Chung

Found 15 papers, 5 papers with code

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

1 code implementation16 Jun 2023 Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments.

Audio Signal Processing

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

no code implementations2 Jun 2023 Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments.

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

no code implementations27 Feb 2023 Yoohwan Kwon, Soo-Whan Chung

Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding for efficient speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Diffusion-based Generative Speech Source Separation

1 code implementation31 Oct 2022 Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi

We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE).

Speech Enhancement

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

1 code implementation30 Jun 2022 Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences.

Keyword Spotting

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

no code implementations28 Mar 2022 Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen

Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.

Speaker Verification

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

no code implementations24 Feb 2022 Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly.

Speech Enhancement

Look Who's Talking: Active Speaker Detection in the Wild

1 code implementation17 Aug 2021 You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung

Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way.

MIRNet: Learning multiple identities representations in overlapped speech

no code implementations4 Aug 2020 Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters.

Rgb-T Tracking Speaker Verification +1

End-to-End Lip Synchronisation Based on Pattern Classification

no code implementations18 May 2020 You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee

The goal of this work is to synchronise audio and video of a talking face using deep neural network models.

Classification General Classification

FaceFilter: Audio-visual speech separation using still images

no code implementations14 May 2020 Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.

Speech Separation

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

no code implementations21 Sep 2018 Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.

Binary Classification Cross-Modal Retrieval +4

Cannot find the paper you are looking for? You can Submit a new open access paper.