no code implementations • 13 Feb 2025 • Xin Wang, Héctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi, Myeonghun Jeong, Ge Zhu, Yongyi Zang, You Zhang, Soumi Maiti, Florian Lux, Nicolas Müller, Wangyou Zhang, Chengzhe Sun, Shuwei Hou, Siwei Lyu, Sébastien Le Maguer, Cheng Gong, Hanjie Guo, Liping Chen, Vishwanath Singh
The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models.
no code implementations • 6 Feb 2025 • Jagabandhu Mishra, Manasi Chhibber, Hye-jin Shim, Tomi H. Kinnunen
We use these probabilistic embeddings with four classifier back-ends to address two downstream tasks: spoofing detection and spoofing attack attribution.
no code implementations • 18 Sep 2024 • Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe
This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data.
no code implementations • 13 Sep 2024 • Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe
The recent literature nonetheless shows efforts to train TTS systems using data collected in the wild.
no code implementations • 16 Aug 2024 • Xin Wang, Hector Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks, and the design of detection solutions.
no code implementations • 25 Jun 2024 • Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen
Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.
no code implementations • 8 Jun 2024 • Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung
The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target.
1 code implementation • 3 Mar 2024 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot
Spoofing detection is today a mainstream research topic.
1 code implementation • 31 May 2023 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen
Audio anti-spoofing for automatic speaker verification aims to safeguard users' identities from spoofing attacks.
no code implementations • 31 May 2023 • Hye-jin Shim, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen
Shortcut learning, or `Clever Hans effect` refers to situations where a learning agent (e. g., deep neural networks) learns spurious correlations present in data, resulting in biased models.
1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung
Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.
no code implementations • 20 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe
We also show that training with proposed large data configurations gives better performance.
1 code implementation • 27 Jun 2022 • Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu
Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility.
no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen
Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.
1 code implementation • 15 Dec 2021 • Ju-ho Kim, Hye-jin Shim, Jungwoo Heo, Ha-Jin Yu
Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems.
2 code implementations • 4 Oct 2021 • Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas Evans
Artefacts that differentiate spoofed from bona-fide utterances can reside in spectral or temporal domains.
Ranked #1 on
Voice Anti-spoofing
on ASVspoof 2019 - LA
no code implementations • 15 Apr 2021 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Furthermore, adopting the proposed attentive max feature map, our team placed fourth in the recent DCASE 2021 challenge.
no code implementations • 14 Apr 2021 • Ju-ho Kim, Hye-jin Shim, Jee-weon Jung, Ha-Jin Yu
By learning the reliable intermediate representation of the mean teacher network, we expect that the proposed method can explore more discriminatory embedding spaces and improve the generalization performance of the speaker verification system.
1 code implementation • 21 Sep 2020 • Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Single task deep neural networks that perform a target task among diverse cross-related tasks in the acoustic scene and event literature are being developed.
no code implementations • 9 Jul 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods.
no code implementations • 10 Jun 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu
In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach.
1 code implementation • 7 May 2020 • Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding.
2 code implementations • 1 Apr 2020 • Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms.
no code implementations • 31 Jan 2020 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
For addition, we utilize the multi-task learning framework to include subsidiary information to the code.
no code implementations • 22 Oct 2019 • Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu
Constructing a dataset for replay spoofing detection requires a physical process of playing an utterance and re-recording it, presenting a challenge to the collection of large-scale datasets.
no code implementations • 1 Jul 2019 • Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, IL-Ho Yang, Ha-Jin Yu
In particular, the adversarial process degrades the performance of the subsidiary model by eliminating the subsidiary information in the input which, in assumption, may degrade the performance of the primary model.
1 code implementation • 23 Apr 2019 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention.
5 code implementations • 17 Apr 2019 • Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu
In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.
no code implementations • 7 Feb 2019 • Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu
Each speaker basis is designed to represent the corresponding speaker in the process of training deep neural networks.
no code implementations • 25 Oct 2018 • Jee-weon Jung, Hee-Soo Heo, Hye-jin Shim, Ha-Jin Yu
The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems.
no code implementations • 29 Aug 2018 • Hye-jin Shim, Jee-weon Jung, Hee-Soo Heo, Sung-Hyun Yoon, Ha-Jin Yu
We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification.