Search Results for author: Nam Soo Kim

Found 42 papers, 16 papers with code

OpenKorPOS: Democratizing Korean Tokenization with Voting-Based Open Corpus Annotation

no code implementations LREC 2022 Sangwhan Moon, Won Ik Cho, Hye Joo Han, Naoaki Okazaki, Nam Soo Kim

As this problem originates from the conventional scheme used when creating a POS tagging corpus, we propose an improvement to the existing scheme, which makes it friendlier to generative tasks.

POS POS Tagging +1

HILCodec: High Fidelity and Lightweight Neural Audio Codec

1 code implementation8 May 2024 Sunghwan Ahn, Beom Jun Woo, Min Hyun Han, Chanyeong Moon, Nam Soo Kim

The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity.

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

no code implementations3 Jan 2024 Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

We also delve into the inference speed and prosody control capabilities of our approach, highlighting the potential of neural transducers in TTS frameworks.

EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

no code implementations11 Dec 2023 Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems.

speaker-diarization Speaker Diarization

EM-Network: Oracle Guided Self-distillation for Sequence Learning

no code implementations14 Jun 2023 Ji Won Yoon, Sunghwan Ahn, Hyeonseung Lee, Minchan Kim, Seok Min Kim, Nam Soo Kim

We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning.

Decoder Machine Translation +2

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation30 May 2023 Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

no code implementations1 Apr 2023 Won Ik Cho, Yoon Kyung Lee, Seoyeon Bae, JiHwan Kim, Sangah Park, Moosung Kim, Sowon Hahn, Nam Soo Kim

Building a natural language dataset requires caution since word semantics is vulnerable to subtle text change or the definition of the annotated concept.

Dialogue Generation Question Answering +2

SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

no code implementations30 Nov 2022 Byoung Jin Choi, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim

Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker.

Speech Synthesis

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

no code implementations28 Nov 2022 Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim

Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

no code implementations12 Oct 2022 Byoung Jin Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim

Several recently proposed text-to-speech (TTS) models achieved to generate the speech samples with the human-level quality in the single-speaker and multi-speaker TTS scenarios with a set of pre-defined speakers.

Fully Unsupervised Training of Few-shot Keyword Spotting

no code implementations6 Oct 2022 Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples.

Keyword Spotting Metric Learning +1

Disentangled Speaker Representation Learning via Mutual Information Minimization

no code implementations17 Aug 2022 Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim

The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance.

Disentanglement Speaker Recognition +2

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

no code implementations13 Apr 2022 Ji Won Yoon, Beom Jun Woo, Nam Soo Kim

Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2. 0, has brought significant improvements in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification

1 code implementation3 Apr 2022 Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim

The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion.

Speaker Verification

Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification

no code implementations16 Dec 2021 Sung Hwan Mun, Min Hyun Han, Dongjune Lee, JiHwan Kim, Nam Soo Kim

In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end.

Contrastive Learning Representation Learning +1

Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

no code implementations5 Nov 2021 Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

Extending this supervised scheme further, we introduce a new type of teacher model for connectionist temporal classification (CTC)-based sequence models, namely Oracle Teacher, that leverages both the source inputs and the output labels as the teacher model's input.

Knowledge Distillation Machine Translation +5

Kosp2e: Korean Speech to English Translation Corpus

1 code implementation6 Jul 2021 Won Ik Cho, Seok Min Kim, Hyunchang Cho, Nam Soo Kim

Most speech-to-text (S2T) translation studies use English speech as a source, which makes it difficult for non-English speakers to take advantage of the S2T technologies.

speech-recognition Speech Recognition +1

Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

1 code implementation3 Apr 2021 Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim

Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency.

Denoising Speech Synthesis

Expressive Text-to-Speech using Style Tag

no code implementations1 Apr 2021 Minchan Kim, Sung Jun Cheon, Byoung Jin Choi, Jong Jin Kim, Nam Soo Kim

In this work, we propose StyleTagging-TTS (ST-TTS), a novel expressive TTS model that utilizes a style tag written in natural language.

Language Modelling TAG

Continuous Monitoring of Blood Pressure with Evidential Regression

no code implementations6 Feb 2021 Hyeongju Kim, Woo Hyun Kang, Hyeonseung Lee, Nam Soo Kim

Photoplethysmogram (PPG) signal-based blood pressure (BP) estimation is a promising candidate for modern BP measurements, as PPG signals can be easily obtained from wearable devices in a non-invasive manner, allowing quick BP measurement.


Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

1 code implementation22 Oct 2020 Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss.

Representation Learning Speaker Recognition +1

Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

no code implementations22 Oct 2020 Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020.

Audio and Speech Processing Sound

TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition

no code implementations3 Aug 2020 Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, Won Ik Cho, Nam Soo Kim

To reduce this computational burden, knowledge distillation (KD), which is a popular model compression method, has been used to transfer knowledge from a deep and complex model (teacher) to a shallower and simpler model (student).

Knowledge Distillation Model Compression +3

WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

1 code implementation8 Jun 2020 Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim

In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time.

Speech Synthesis

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

no code implementations17 May 2020 Won Ik Cho, Dong-Hyun Kwak, Ji Won Yoon, Nam Soo Kim

We transfer the knowledge from a concrete Transformer-based text LM to an SLU module which can face a data shortage, based on recent cross-modal distillation methodologies.

Computational Efficiency speech-recognition +2

Towards an Efficient Code-Mixed Grapheme-to-Phoneme Conversion in an Agglutinative Language: A Case Study on To-Korean Transliteration

no code implementations LREC 2020 Won Ik Cho, Seok Min Kim, Nam Soo Kim

Code-mixed grapheme-to-phoneme (G2P) conversion is a crucial issue for modern speech recognition and synthesis task, but has been seldom investigated in sentence-level in literature.

Philosophy Sentence +3

Discourse Component to Sentence (DC2S): An Efficient Human-Aided Construction of Paraphrase and Sentence Similarity Dataset

no code implementations LREC 2020 Won Ik Cho, Jong In Kim, Young Ki Moon, Nam Soo Kim

Assessing the similarity of sentences and detecting paraphrases is an essential task both in theory and practice, but achieving a reliable dataset requires high resource.

Natural Language Inference Paraphrase Generation +2

Machines Getting with the Program: Understanding Intent Arguments of Non-Canonical Directives

1 code implementation Findings of the Association for Computational Linguistics 2020 Won Ik Cho, Young Ki Moon, Sangwhan Moon, Seok Min Kim, Nam Soo Kim

Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective.

Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution

1 code implementation21 Oct 2019 Won Ik Cho, Jeonghwa Cho, Woo Hyun Kang, Nam Soo Kim

Analyzing how human beings resolve syntactic ambiguity has long been an issue of interest in the field of linguistics.

Sentence Spoken Language Understanding

Investigating an Effective Character-level Embedding in Korean Sentence Classification

2 code implementations31 May 2019 Won Ik Cho, Seok Min Kim, Nam Soo Kim

Different from the writing systems of many Romance and Germanic languages, some languages or language families show complex conjunct forms in character composition.

Classification General Classification +3

On Measuring Gender Bias in Translation of Gender-neutral Pronouns

1 code implementation WS 2019 Won Ik Cho, Ji Won Kim, Seok Min Kim, Nam Soo Kim

However, detection and evaluation of gender bias in the machine translation systems are not yet thoroughly investigated, for the task being cross-lingual and challenging to define.

Ethics Image Captioning +3

Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency

2 code implementations10 Nov 2018 Won Ik Cho, Hyeon Seung Lee, Ji Won Yoon, Seok Min Kim, Nam Soo Kim

This paper suggests a system which identifies the inherent intention of a spoken utterance given its transcript, in some cases using auxiliary acoustic features.


Giving Space to Your Message: Assistive Word Segmentation for the Electronic Typing of Digital Minorities

1 code implementation31 Oct 2018 Won Ik Cho, Sung Jun Cheon, Woo Hyun Kang, Ji Won Kim, Nam Soo Kim

For readability and disambiguation of the written text, appropriate word segmentation is recommended for documentation, and it also holds for the digitized texts.


Cannot find the paper you are looking for? You can Submit a new open access paper.