no code implementations • 11 Dec 2023 • Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim
In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems.
1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung
Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.
no code implementations • 12 Oct 2022 • Byoung Jin Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim
Several recently proposed text-to-speech (TTS) models achieved to generate the speech samples with the human-level quality in the single-speaker and multi-speaker TTS scenarios with a set of pre-defined speakers.
no code implementations • 6 Oct 2022 • Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim
For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples.
no code implementations • 17 Aug 2022 • Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim
The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance.
1 code implementation • 3 Apr 2022 • Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim
The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion.
no code implementations • 16 Dec 2021 • Sung Hwan Mun, Min Hyun Han, Dongjune Lee, JiHwan Kim, Nam Soo Kim
In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end.
1 code implementation • 22 Oct 2020 • Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim
In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss.
no code implementations • 22 Oct 2020 • Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim
This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020.
Audio and Speech Processing Sound