Search Results for author: Myeonghun Jeong

Found 12 papers, 2 papers with code

SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

no code implementations7 Oct 2024 Minchan Kim, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim

It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR).

Computational Efficiency text-to-speech +1

High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

no code implementations25 Jun 2024 Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i. e., semantic and acoustic tokens, for high-fidelity speech synthesis.

Computational Efficiency Language Modeling +4

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

no code implementations3 Jan 2024 Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

We also delve into the inference speed and prosody control capabilities of our approach, highlighting the potential of neural transducers in TTS frameworks.

text-to-speech Text to Speech

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation30 May 2023 Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

no code implementations30 Nov 2022 Byoung Jin Choi, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim

Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker.

Speech Synthesis text-to-speech +1

Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

no code implementations12 Oct 2022 Byoung Jin Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim

Several recently proposed text-to-speech (TTS) models achieved to generate the speech samples with the human-level quality in the single-speaker and multi-speaker TTS scenarios with a set of pre-defined speakers.

text-to-speech Text to Speech

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

no code implementations29 Mar 2022 Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim

The experimental results verify the effectiveness of the proposed method in terms of naturalness, intelligibility, and speaker generalization.

text-to-speech Text to Speech +2

Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

1 code implementation3 Apr 2021 Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim

Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency.

Denoising Speech Synthesis +2

Cannot find the paper you are looking for? You can Submit a new open access paper.