1 code implementation • 1 Jul 2022 • Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoirin Kim
Our method reduces the model to 23. 8% in size and 35. 9% in inference time compared to HuBERT.
no code implementations • 2 Nov 2020 • Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim
Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation.
no code implementations • 6 Oct 2020 • Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim
At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments.
no code implementations • 9 Aug 2020 • Yeunju Choi, Youngmoon Jung, Hoirin Kim
While deep learning has made impressive progress in speech synthesis and voice conversion, the assessment of the synthesized speech is still carried out by human participants.
no code implementations • 16 Jul 2020 • Yeunju Choi, Youngmoon Jung, Hoirin Kim
In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC).
no code implementations • 8 May 2020 • Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim
Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary.
no code implementations • 7 Apr 2020 • Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim
In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor.
1 code implementation • 6 Apr 2020 • Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim
By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance (1-2 seconds) on the VoxCeleb datasets.
1 code implementation • 27 Mar 2020 • Joohyung Lee, Youngmoon Jung, Hoirin Kim
The results show that the focal loss can improve the performance in various imbalance situations compared to the cross entropy loss, a commonly used loss function in VAD.
no code implementations • 1 Oct 2019 • Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim
Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection.
no code implementations • 26 Sep 2019 • Youngmoon Jung, Yeunju Choi, Hoirin Kim
The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feature extractor.
no code implementations • 19 Jun 2019 • Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim
Furthermore, we apply deep length normalization by augmenting the loss function with ring loss.
no code implementations • 7 Nov 2018 • Hyungjun Lim, Younggwan Kim, Youngmoon Jung, Myunghun Jung, Hoirin Kim
Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network.