Search Results for author: Soo-Whan Chung

Found 15 papers, 5 papers with code

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

1 code implementation • 16 Jun 2023 • Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments.

Audio Signal Processing

Paper
Code

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

no code implementations • 2 Jun 2023 • Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments.

Paper
Add Code

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

1 code implementation • 27 Feb 2023 • Jiyoung Lee, Joon Son Chung, Soo-Whan Chung

This is the first time that face images are used as a condition to train a TTS model.

Speech Synthesis Text-To-Speech Synthesis

Paper
Code

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

no code implementations • 27 Feb 2023 • Yoohwan Kwon, Soo-Whan Chung

Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding for efficient speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Diffusion-based Generative Speech Source Separation

1 code implementation • 31 Oct 2022 • Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi

We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE).

Speech Enhancement

Paper
Code

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

1 code implementation • 30 Jun 2022 • Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences.

Keyword Spotting

Paper
Code

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen

Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.

Speaker Verification

Paper
Add Code

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

no code implementations • 24 Feb 2022 • Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly.

Speech Enhancement

Paper
Add Code

Look Who's Talking: Active Speaker Detection in the Wild

1 code implementation • 17 Aug 2021 • You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung

Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way.

Paper
Code

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation

no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn

In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.

Audio-Visual Synchronization Speech Separation

Paper
Add Code

MIRNet: Learning multiple identities representations in overlapped speech

no code implementations • 4 Aug 2020 • Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters.

Rgb-T Tracking Speaker Verification +1

Paper
Add Code

End-to-End Lip Synchronisation Based on Pattern Classification

no code implementations • 18 May 2020 • You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee

The goal of this work is to synchronise audio and video of a talking face using deep neural network models.

Classification General Classification

Paper
Add Code

FaceFilter: Audio-visual speech separation using still images

no code implementations • 14 May 2020 • Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.

Speech Separation

Paper
Add Code

Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

no code implementations • 29 Apr 2020 • Soo-Whan Chung, Hong Goo Kang, Joon Son Chung

We build on earlier work to train embeddings that are more discriminative for uni-modal downstream tasks.

Lip Reading Self-Supervised Learning +1

Paper
Add Code

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

no code implementations • 21 Sep 2018 • Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.

Binary Classification Cross-Modal Retrieval +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.