Search Results for author: Minchan Kim

Found 10 papers, 0 papers with code

Expressive Text-to-Speech using Style Tag

no code implementations • 1 Apr 2021 • Minchan Kim, Sung Jun Cheon, Byoung Jin Choi, Jong Jin Kim, Nam Soo Kim

In this work, we propose StyleTagging-TTS (ST-TTS), a novel expressive TTS model that utilizes a style tag written in natural language.

Language Modelling TAG

Paper
Add Code

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

no code implementations • 29 Mar 2022 • Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim

The experimental results verify the effectiveness of the proposed method in terms of naturalness, intelligibility, and speaker generalization.

Transfer Learning Zero-Shot Multi-Speaker TTS

Paper
Add Code

Disentangled Speaker Representation Learning via Mutual Information Minimization

no code implementations • 17 Aug 2022 • Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim

The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance.

Disentanglement Speaker Recognition +2

Paper
Add Code

Fully Unsupervised Training of Few-shot Keyword Spotting

no code implementations • 6 Oct 2022 • Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples.

Keyword Spotting Metric Learning +1

Paper
Add Code

Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

no code implementations • 12 Oct 2022 • Byoung Jin Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim

Several recently proposed text-to-speech (TTS) models achieved to generate the speech samples with the human-level quality in the single-speaker and multi-speaker TTS scenarios with a set of pre-defined speakers.

Paper
Add Code

EM-Network: Oracle Guided Self-distillation for Sequence Learning

no code implementations • 14 Jun 2023 • Ji Won Yoon, Sunghwan Ahn, Hyeonseung Lee, Minchan Kim, Seok Min Kim, Nam Soo Kim

We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning.

Machine Translation speech-recognition +1

Paper
Add Code

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

no code implementations • 6 Nov 2023 • Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim

We introduce a text-to-speech(TTS) framework based on a neural transducer.

Paper
Add Code

Efficient Parallel Audio Generation using Group Masked Language Modeling

no code implementations • 2 Jan 2024 • Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim

We present a fast and high-quality codec language model for parallel audio generation.

Audio Generation Computational Efficiency +2

Paper
Add Code

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

no code implementations • 3 Jan 2024 • Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

We also delve into the inference speed and prosody control capabilities of our approach, highlighting the potential of neural transducers in TTS frameworks.

Paper
Add Code

Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

no code implementations • 24 Mar 2024 • Minchan Kim, Minyeong Kim, Junik Bae, Suhwan Choi, Sungkyung Kim, Buru Chang

Subsequently, ESREAL computes token-level hallucination scores by assessing the semantic similarity of aligned regions based on the type of hallucination.

Hallucination Semantic Similarity +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.