no code implementations • 26 Sep 2023 • Hee-Soo Heo, Kihyun Nam, Bong-Jin Lee, Youngki Kwon, Minjae Lee, You Jin Kim, Joon Son Chung
In the field of speaker verification, session or channel variability poses a significant challenge.
no code implementations • 1 Jun 2023 • Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee
The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications.
no code implementations • 9 Nov 2022 • Youngki Kwon, Hee-Soo Heo, Bong-Jin Lee, You Jin Kim, Jee-weon Jung
Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains.
no code implementations • 8 Nov 2022 • Hee-Soo Heo, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Jee-weon Jung
Extracted dense frame-level embeddings can each represent a speaker.
no code implementations • 26 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
no code implementations • 20 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe
We also show that training with proposed large data configurations gives better performance.
no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen
Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.
no code implementations • 28 Mar 2022 • Hee-Soo Heo, Jee-weon Jung, Jingu Kang, Youngki Kwon, You Jin Kim, Bong-Jin Lee, Joon Son Chung
The goal of this paper is to train effective self-supervised speaker representations without identity labels.
2 code implementations • 16 Mar 2022 • Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung
Our best model achieves an equal error rate of 0. 89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin.
no code implementations • 7 Oct 2021 • You Jin Kim, Hee-Soo Heo, Jee-weon Jung, Youngki Kwon, Bong-Jin Lee, Joon Son Chung
The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation.
no code implementations • 7 Oct 2021 • Youngki Kwon, Hee-Soo Heo, Jee-weon Jung, You Jin Kim, Bong-Jin Lee, Joon Son Chung
The objective of this work is effective speaker diarisation using multi-scale speaker embeddings.
2 code implementations • 4 Oct 2021 • Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas Evans
Artefacts that differentiate spoofed from bona-fide utterances can reside in spectral or temporal domains.
Ranked #1 on
Voice Anti-spoofing
on ASVspoof 2019 - LA
1 code implementation • 17 Aug 2021 • You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung
Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way.
no code implementations • 7 Apr 2021 • Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, Joon Son Chung
The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation.
no code implementations • 7 Apr 2021 • Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-Jin Lee
In this work, we propose an overlapped speech detection system trained as a three-class classifier.
no code implementations • 22 Oct 2020 • Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung
The proposed framework inputs segment-wise speaker embeddings from an enrollment and a test utterance and directly outputs a similarity score.
no code implementations • 31 Jan 2020 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
For addition, we utilize the multi-task learning framework to include subsidiary information to the code.
no code implementations • 22 Oct 2019 • Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu
Constructing a dataset for replay spoofing detection requires a physical process of playing an utterance and re-recording it, presenting a challenge to the collection of large-scale datasets.
no code implementations • 1 Jul 2019 • Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, IL-Ho Yang, Ha-Jin Yu
In particular, the adversarial process degrades the performance of the subsidiary model by eliminating the subsidiary information in the input which, in assumption, may degrade the performance of the primary model.
1 code implementation • 23 Apr 2019 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention.
4 code implementations • 17 Apr 2019 • Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu
In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.
no code implementations • 7 Feb 2019 • Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu
Each speaker basis is designed to represent the corresponding speaker in the process of training deep neural networks.
no code implementations • 25 Oct 2018 • Jee-weon Jung, Hee-Soo Heo, Hye-jin Shim, Ha-Jin Yu
The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems.
no code implementations • 29 Aug 2018 • Hye-jin Shim, Jee-weon Jung, Hee-Soo Heo, Sung-Hyun Yoon, Ha-Jin Yu
We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification.