no code implementations • 18 Jun 2024 • Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments.
1 code implementation • 16 Jun 2023 • Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang
We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments.
no code implementations • 2 Jun 2023 • Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang
This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments.
1 code implementation • 27 Feb 2023 • Jiyoung Lee, Joon Son Chung, Soo-Whan Chung
This is the first time that face images are used as a condition to train a TTS model.
no code implementations • 27 Feb 2023 • Yoohwan Kwon, Soo-Whan Chung
Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding for efficient speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 31 Oct 2022 • Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi
We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE).
1 code implementation • 30 Jun 2022 • Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang
In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences.
no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen
Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.
no code implementations • 24 Feb 2022 • Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang
Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly.
1 code implementation • 17 Aug 2021 • You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung
Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way.
no code implementations • CVPR 2021 • Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn
In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.
no code implementations • 4 Aug 2020 • Hyewon Han, Soo-Whan Chung, Hong-Goo Kang
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters.
no code implementations • 18 May 2020 • You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee
The goal of this work is to synchronise audio and video of a talking face using deep neural network models.
no code implementations • 14 May 2020 • Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang
The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.
no code implementations • 29 Apr 2020 • Soo-Whan Chung, Hong Goo Kang, Joon Son Chung
We build on earlier work to train embeddings that are more discriminative for uni-modal downstream tasks.
no code implementations • 21 Sep 2018 • Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang
This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.