1 code implementation • 11 Jun 2024 • Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu
In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers.
1 code implementation • 15 Dec 2023 • Young Joo Han, Ha-Jin Yu
Therefore, recent studies have proposed methods that employ data-driven generative models, such as Generative Adversarial Networks (GAN) and Normalizing Flows.
1 code implementation • 15 Sep 2023 • Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu
Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems.
1 code implementation • 14 Sep 2023 • Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu
Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure.
1 code implementation • 20 Jul 2023 • Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-Jin Yu
In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environments.
1 code implementation • 27 May 2023 • Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-Jin Yu
This paper suggests One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates KD and fine-tuning (FT).
1 code implementation • 17 May 2023 • Young-Joo Han, Ha-Jin Yu
However, these methods rely on large-scale noisy-clean image pairs, which are difficult to obtain in practice.
1 code implementation • 4 Nov 2022 • Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu
To overcome these limitations, the present study explores and applies efficient transfer learning methods in the audio domain.
no code implementations • 3 Nov 2022 • Jungwoo Heo, Hyun-seo Shin, Ju-ho Kim, Chan-yeong Lim, Ha-Jin Yu
In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative.
1 code implementation • 27 Jun 2022 • Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu
Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility.
no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen
Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.
1 code implementation • 15 Dec 2021 • Ju-ho Kim, Hye-jin Shim, Jungwoo Heo, Ha-Jin Yu
Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems.
2 code implementations • 4 Oct 2021 • Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas Evans
Artefacts that differentiate spoofed from bona-fide utterances can reside in spectral or temporal domains.
Ranked #1 on
Voice Anti-spoofing
on ASVspoof 2019 - LA
no code implementations • 15 Apr 2021 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Furthermore, adopting the proposed attentive max feature map, our team placed fourth in the recent DCASE 2021 challenge.
no code implementations • 14 Apr 2021 • Ju-ho Kim, Hye-jin Shim, Jee-weon Jung, Ha-Jin Yu
By learning the reliable intermediate representation of the mean teacher network, we expect that the proposed method can explore more discriminatory embedding spaces and improve the generalization performance of the speaker verification system.
no code implementations • 22 Oct 2020 • Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung
The proposed framework inputs segment-wise speaker embeddings from an enrollment and a test utterance and directly outputs a similarity score.
1 code implementation • 21 Sep 2020 • Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Single task deep neural networks that perform a target task among diverse cross-related tasks in the acoustic scene and event literature are being developed.
no code implementations • 9 Jul 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods.
no code implementations • 10 Jun 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu
In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach.
1 code implementation • 7 May 2020 • Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding.
2 code implementations • 1 Apr 2020 • Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms.
no code implementations • 31 Jan 2020 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
For addition, we utilize the multi-task learning framework to include subsidiary information to the code.
no code implementations • 22 Oct 2019 • Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu
Constructing a dataset for replay spoofing detection requires a physical process of playing an utterance and re-recording it, presenting a challenge to the collection of large-scale datasets.
no code implementations • 1 Jul 2019 • Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, IL-Ho Yang, Ha-Jin Yu
In particular, the adversarial process degrades the performance of the subsidiary model by eliminating the subsidiary information in the input which, in assumption, may degrade the performance of the primary model.
1 code implementation • 23 Apr 2019 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention.
4 code implementations • 17 Apr 2019 • Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu
In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.
no code implementations • 7 Feb 2019 • Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu
Each speaker basis is designed to represent the corresponding speaker in the process of training deep neural networks.
no code implementations • 25 Oct 2018 • Jee-weon Jung, Hee-Soo Heo, Hye-jin Shim, Ha-Jin Yu
The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems.
no code implementations • 29 Aug 2018 • Hye-jin Shim, Jee-weon Jung, Hee-Soo Heo, Sung-Hyun Yoon, Ha-Jin Yu
We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification.