no code implementations • 13 Feb 2025 • Xin Wang, Héctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi, Myeonghun Jeong, Ge Zhu, Yongyi Zang, You Zhang, Soumi Maiti, Florian Lux, Nicolas Müller, Wangyou Zhang, Chengzhe Sun, Shuwei Hou, Siwei Lyu, Sébastien Le Maguer, Cheng Gong, Hanjie Guo, Liping Chen, Vishwanath Singh
The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models.
2 code implementations • 24 Sep 2024 • Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharhi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe
Neural codecs have become crucial to recent speech and audio generation research.
no code implementations • 18 Sep 2024 • Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe
This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data.
no code implementations • 16 Sep 2024 • Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko, Barry-John Theobald
To this end, we show that the simple, well-studied, and established i-vector generative model is enough to bootstrap the IPL process for the unsupervised learning of speaker representations.
no code implementations • 13 Sep 2024 • Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe
The recent literature nonetheless shows efforts to train TTS systems using data collected in the wild.
no code implementations • 27 Aug 2024 • Jaesung Huh, Joon Son Chung, Arsha Nagrani, Andrew Brown, Jee-weon Jung, Daniel Garcia-Romero, Andrew Zisserman
In this paper, we provide a review of these challenges that covers: what they explored; the methods developed by the challenge participants and how these evolved; and also the current state of the field for speaker verification and diarisation.
no code implementations • 16 Aug 2024 • Xin Wang, Hector Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks, and the design of detection solutions.
no code implementations • 25 Jun 2024 • Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen
Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.
no code implementations • 14 Jun 2024 • Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head.
no code implementations • 8 Jun 2024 • Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung
The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target.
1 code implementation • 6 Jun 2024 • Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade.
1 code implementation • 3 Mar 2024 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot
Spoofing detection is today a mainstream research topic.
1 code implementation • 25 Feb 2024 • Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro
We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text.
1 code implementation • 30 Jan 2024 • Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe
Recent studies have highlighted the importance of fully open foundation models.
2 code implementations • 30 Jan 2024 • Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe
First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.
Ranked #1 on
Speaker Recognition
on VoxCeleb1
(using extra training data)
1 code implementation • 25 Jan 2024 • Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian
In this paper we propose novel architectures to improve the input condition invariant SE model so that performance in simulated conditions remains competitive while real condition degradation is much mitigated.
1 code implementation • 10 Jan 2024 • Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe
We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation.
1 code implementation • 15 Dec 2023 • Kwanghee Choi, Jee-weon Jung, Shinji Watanabe
With the success of self-supervised representations, researchers seek a better understanding of the information encapsulated within a representation.
no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe
Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.
Ranked #1 on
Spoken Language Understanding
on Fluent Speech Commands
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 2 Oct 2023 • Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini
This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition).
6 code implementations • 25 Sep 2023 • Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe
Pre-training speech models on large volumes of data has achieved remarkable success.
no code implementations • 14 Sep 2023 • Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe
We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation.
no code implementations • 1 Jun 2023 • Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee
The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications.
1 code implementation • 31 May 2023 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen
Audio anti-spoofing for automatic speaker verification aims to safeguard users' identities from spoofing attacks.
1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung
Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.
1 code implementation • 20 Feb 2023 • Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman
This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.
no code implementations • 9 Nov 2022 • Youngki Kwon, Hee-Soo Heo, Bong-Jin Lee, You Jin Kim, Jee-weon Jung
Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains.
no code implementations • 8 Nov 2022 • Hee-Soo Heo, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Jee-weon Jung
Extracted dense frame-level embeddings can each represent a speaker.
no code implementations • 1 Nov 2022 • Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee Soo Heo, Jee-weon Jung, Joon Son Chung
The goal of this paper is to learn robust speaker representation for bilingual speaking scenario.
no code implementations • 26 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
no code implementations • 20 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe
We also show that training with proposed large data configurations gives better performance.
1 code implementation • 3 Apr 2022 • Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim
The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion.
no code implementations • 28 Mar 2022 • Hee-Soo Heo, Jee-weon Jung, Jingu Kang, Youngki Kwon, You Jin Kim, Bong-Jin Lee, Joon Son Chung
The goal of this paper is to train effective self-supervised speaker representations without identity labels.
no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen
Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.
2 code implementations • 16 Mar 2022 • Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung
Our best model achieves an equal error rate of 0. 89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin.
3 code implementations • 24 Feb 2022 • Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, Nicholas Evans
The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data.
Ranked #2 on
Audio Deepfake Detection
on ASVspoof 2021
no code implementations • 7 Oct 2021 • Youngki Kwon, Hee-Soo Heo, Jee-weon Jung, You Jin Kim, Bong-Jin Lee, Joon Son Chung
The objective of this work is effective speaker diarisation using multi-scale speaker embeddings.
no code implementations • 7 Oct 2021 • You Jin Kim, Hee-Soo Heo, Jee-weon Jung, Youngki Kwon, Bong-Jin Lee, Joon Son Chung
The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation.
2 code implementations • 4 Oct 2021 • Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas Evans
Artefacts that differentiate spoofed from bona-fide utterances can reside in spectral or temporal domains.
Ranked #1 on
Voice Anti-spoofing
on ASVspoof 2019 - LA
1 code implementation • 27 Jul 2021 • Hemlata Tak, Jee-weon Jung, Jose Patino, Madhu Kamble, Massimiliano Todisco, Nicholas Evans
Artefacts that serve to distinguish bona fide speech from spoofed or deepfake speech are known to reside in specific subbands and temporal segments.
Ranked #6 on
Audio Deepfake Detection
on ASVspoof 2021
no code implementations • 15 Apr 2021 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Furthermore, adopting the proposed attentive max feature map, our team placed fourth in the recent DCASE 2021 challenge.
no code implementations • 14 Apr 2021 • Ju-ho Kim, Hye-jin Shim, Jee-weon Jung, Ha-Jin Yu
By learning the reliable intermediate representation of the mean teacher network, we expect that the proposed method can explore more discriminatory embedding spaces and improve the generalization performance of the speaker verification system.
no code implementations • 8 Apr 2021 • Hemlata Tak, Jee-weon Jung, Jose Patino, Massimiliano Todisco, Nicholas Evans
This paper reports our use of graph attention networks (GATs) to model these relationships and to improve spoofing detection performance.
no code implementations • 7 Apr 2021 • Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-Jin Lee
In this work, we propose an overlapped speech detection system trained as a three-class classifier.
no code implementations • 7 Apr 2021 • Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, Joon Son Chung
The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation.
no code implementations • 22 Oct 2020 • Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung
The proposed framework inputs segment-wise speaker embeddings from an enrollment and a test utterance and directly outputs a similarity score.
1 code implementation • 21 Sep 2020 • Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Single task deep neural networks that perform a target task among diverse cross-related tasks in the acoustic scene and event literature are being developed.
no code implementations • 9 Jul 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu
Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods.
no code implementations • 10 Jun 2020 • Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu
In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach.
1 code implementation • 7 May 2020 • Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding.
2 code implementations • 1 Apr 2020 • Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms.
no code implementations • 31 Jan 2020 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
For addition, we utilize the multi-task learning framework to include subsidiary information to the code.
no code implementations • 22 Oct 2019 • Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu
Constructing a dataset for replay spoofing detection requires a physical process of playing an utterance and re-recording it, presenting a challenge to the collection of large-scale datasets.
no code implementations • 1 Jul 2019 • Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, IL-Ho Yang, Ha-Jin Yu
In particular, the adversarial process degrades the performance of the subsidiary model by eliminating the subsidiary information in the input which, in assumption, may degrade the performance of the primary model.
1 code implementation • 23 Apr 2019 • Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention.
5 code implementations • 17 Apr 2019 • Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu
In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.
no code implementations • 7 Feb 2019 • Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu
Each speaker basis is designed to represent the corresponding speaker in the process of training deep neural networks.
no code implementations • 25 Oct 2018 • Jee-weon Jung, Hee-Soo Heo, Hye-jin Shim, Ha-Jin Yu
The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems.
no code implementations • 29 Aug 2018 • Hye-jin Shim, Jee-weon Jung, Hee-Soo Heo, Sung-Hyun Yoon, Ha-Jin Yu
We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification.