no code implementations • SEMEVAL 2018 • Won Ik Cho, Woo Hyun Kang, Nam Soo Kim
This paper proposes a novel feature extraction process for SemEval task 3: Irony detection in English tweets.
1 code implementation • 10 Oct 2018 • Won Ik Cho, Young Ki Moon, Woo Hyun Kang, Nam Soo Kim
Intention identification is a core issue in dialog management.
1 code implementation • 31 Oct 2018 • Won Ik Cho, Sung Jun Cheon, Woo Hyun Kang, Ji Won Kim, Nam Soo Kim
For readability and disambiguation of the written text, appropriate word segmentation is recommended for documentation, and it also holds for the digitized texts.
2 code implementations • 10 Nov 2018 • Won Ik Cho, Hyeon Seung Lee, Ji Won Yoon, Seok Min Kim, Nam Soo Kim
This paper suggests a system which identifies the inherent intention of a spoken utterance given its transcript, in some cases using auxiliary acoustic features.
1 code implementation • WS 2019 • Won Ik Cho, Ji Won Kim, Seok Min Kim, Nam Soo Kim
However, detection and evaluation of gender bias in the machine translation systems are not yet thoroughly investigated, for the task being cross-lingual and challenging to define.
2 code implementations • 31 May 2019 • Won Ik Cho, Seok Min Kim, Nam Soo Kim
Different from the writing systems of many Romance and Germanic languages, some languages or language families show complex conjunct forms in character composition.
1 code implementation • 21 Oct 2019 • Won Ik Cho, Jeonghwa Cho, Woo Hyun Kang, Nam Soo Kim
Analyzing how human beings resolve syntactic ambiguity has long been an issue of interest in the field of linguistics.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Won Ik Cho, Young Ki Moon, Sangwhan Moon, Seok Min Kim, Nam Soo Kim
Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective.
no code implementations • LREC 2020 • Won Ik Cho, Jong In Kim, Young Ki Moon, Nam Soo Kim
Assessing the similarity of sentences and detecting paraphrases is an essential task both in theory and practice, but achieving a reliable dataset requires high resource.
no code implementations • LREC 2020 • Won Ik Cho, Seok Min Kim, Nam Soo Kim
Code-mixed grapheme-to-phoneme (G2P) conversion is a crucial issue for modern speech recognition and synthesis task, but has been seldom investigated in sentence-level in literature.
no code implementations • 17 May 2020 • Won Ik Cho, Dong-Hyun Kwak, Ji Won Yoon, Nam Soo Kim
We transfer the knowledge from a concrete Transformer-based text LM to an SLU module which can face a data shortage, based on recent cross-modal distillation methodologies.
1 code implementation • 8 Jun 2020 • Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim
In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time.
1 code implementation • NeurIPS 2020 • Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Joun Yeop Lee, Nam Soo Kim
Flow-based generative models are composed of invertible transformations between two random variables of the same dimension.
Ranked #3 on Point Cloud Generation on ShapeNet Airplane
no code implementations • 3 Aug 2020 • Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, Won Ik Cho, Nam Soo Kim
To reduce this computational burden, knowledge distillation (KD), which is a popular model compression method, has been used to transfer knowledge from a deep and complex model (teacher) to a shallower and simpler model (student).
no code implementations • 22 Oct 2020 • Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim
This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020.
Audio and Speech Processing Sound
1 code implementation • 22 Oct 2020 • Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim
In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss.
no code implementations • 6 Feb 2021 • Hyeongju Kim, Woo Hyun Kang, Hyeonseung Lee, Nam Soo Kim
Photoplethysmogram (PPG) signal-based blood pressure (BP) estimation is a promising candidate for modern BP measurements, as PPG signals can be easily obtained from wearable devices in a non-invasive manner, allowing quick BP measurement.
1 code implementation • LREC 2022 • Won Ik Cho, Sangwhan Moon, Jong In Kim, Seok Min Kim, Nam Soo Kim
Paraphrasing is often performed with less concern for controlled style conversion.
no code implementations • 1 Apr 2021 • Minchan Kim, Sung Jun Cheon, Byoung Jin Choi, Jong Jin Kim, Nam Soo Kim
In this work, we propose StyleTagging-TTS (ST-TTS), a novel expressive TTS model that utilizes a style tag written in natural language.
1 code implementation • 3 Apr 2021 • Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim
Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency.
1 code implementation • 6 Jul 2021 • Won Ik Cho, Seok Min Kim, Hyunchang Cho, Nam Soo Kim
Most speech-to-text (S2T) translation studies use English speech as a source, which makes it difficult for non-English speakers to take advantage of the S2T technologies.
no code implementations • 5 Nov 2021 • Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim
Extending this supervised scheme further, we introduce a new type of teacher model for connectionist temporal classification (CTC)-based sequence models, namely Oracle Teacher, that leverages both the source inputs and the output labels as the teacher model's input.
no code implementations • 16 Dec 2021 • Sung Hwan Mun, Min Hyun Han, Dongjune Lee, JiHwan Kim, Nam Soo Kim
In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end.
no code implementations • 29 Mar 2022 • Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim
The experimental results verify the effectiveness of the proposed method in terms of naturalness, intelligibility, and speaker generalization.
1 code implementation • 3 Apr 2022 • Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim
The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion.
no code implementations • 13 Apr 2022 • Ji Won Yoon, Beom Jun Woo, Nam Soo Kim
Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2. 0, has brought significant improvements in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 17 Aug 2022 • Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim
The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance.
no code implementations • 6 Oct 2022 • Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim
For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples.
no code implementations • 12 Oct 2022 • Byoung Jin Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim
Several recently proposed text-to-speech (TTS) models achieved to generate the speech samples with the human-level quality in the single-speaker and multi-speaker TTS scenarios with a set of pre-defined speakers.
no code implementations • 28 Nov 2022 • Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim
Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 30 Nov 2022 • Byoung Jin Choi, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim
Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker.
no code implementations • 1 Apr 2023 • Won Ik Cho, Yoon Kyung Lee, Seoyeon Bae, JiHwan Kim, Sangah Park, Moosung Kim, Sowon Hahn, Nam Soo Kim
Building a natural language dataset requires caution since word semantics is vulnerable to subtle text change or the definition of the annotated concept.
1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung
Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.
no code implementations • 14 Jun 2023 • Ji Won Yoon, Seok Min Kim, Nam Soo Kim
Self-supervised learning (SSL) has shown significant progress in speech processing tasks.
no code implementations • 14 Jun 2023 • Ji Won Yoon, Sunghwan Ahn, Hyeonseung Lee, Minchan Kim, Seok Min Kim, Nam Soo Kim
We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning.
no code implementations • 6 Nov 2023 • Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim
We introduce a text-to-speech(TTS) framework based on a neural transducer.
no code implementations • 11 Dec 2023 • Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim
In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems.
no code implementations • 2 Jan 2024 • Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim
We present a fast and high-quality codec language model for parallel audio generation.
no code implementations • 3 Jan 2024 • Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim
We also delve into the inference speed and prosody control capabilities of our approach, highlighting the potential of neural transducers in TTS frameworks.
no code implementations • LREC 2022 • Sangwhan Moon, Won Ik Cho, Hye Joo Han, Naoaki Okazaki, Nam Soo Kim
As this problem originates from the conventional scheme used when creating a POS tagging corpus, we propose an improvement to the existing scheme, which makes it friendlier to generative tasks.