1 code implementation • 11 Jun 2024 • Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu
In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers.
1 code implementation • 15 Sep 2023 • Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu
Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems.
1 code implementation • 14 Sep 2023 • Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu
Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure.
1 code implementation • 20 Jul 2023 • Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-Jin Yu
In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environments.
1 code implementation • 27 May 2023 • Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-Jin Yu
This paper suggests One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates KD and fine-tuning (FT).
1 code implementation • 4 Nov 2022 • Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu
To overcome these limitations, the present study explores and applies efficient transfer learning methods in the audio domain.
no code implementations • 3 Nov 2022 • Jungwoo Heo, Hyun-seo Shin, Ju-ho Kim, Chan-yeong Lim, Ha-Jin Yu
In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative.
no code implementations • 28 Jun 2022 • Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin
The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade.
1 code implementation • 27 Jun 2022 • Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu
Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility.
1 code implementation • 15 Dec 2021 • Ju-ho Kim, Hye-jin Shim, Jungwoo Heo, Ha-Jin Yu
Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems.
no code implementations • 15 Apr 2021 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Furthermore, adopting the proposed attentive max feature map, our team placed fourth in the recent DCASE 2021 challenge.
no code implementations • 14 Apr 2021 • Ju-ho Kim, Hye-jin Shim, Jee-weon Jung, Ha-Jin Yu
By learning the reliable intermediate representation of the mean teacher network, we expect that the proposed method can explore more discriminatory embedding spaces and improve the generalization performance of the speaker verification system.
1 code implementation • 21 Sep 2020 • Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Single task deep neural networks that perform a target task among diverse cross-related tasks in the acoustic scene and event literature are being developed.
no code implementations • 9 Jul 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods.
no code implementations • 10 Jun 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu
In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach.
1 code implementation • 7 May 2020 • Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding.
2 code implementations • 1 Apr 2020 • Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms.
4 code implementations • 17 Apr 2019 • Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu
In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.
no code implementations • WS 2018 • Ju-ho Kim, Christopher Malon, Asim Kadav
Existing entailment datasets mainly pose problems which can be answered without attention to grammar or word order.
7 code implementations • 19 Jun 2017 • Keunwoo Choi, Deokjin Joo, Ju-ho Kim
We introduce Kapre, Keras layers for audio and music signal preprocessing.