Search Results for author: Junhyeok Lee

Found 14 papers, 9 papers with code

LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping

no code implementations28 Feb 2024 Changho Choi, Minho Kim, Junhyeok Lee, Hyoung-Kyu Song, Younggeun Kim, Seungryong Kim

We show that our framework is applicable to other generators such as StyleNeRF, paving a way to 3D-aware face swapping and is also compatible with other downstream StyleGAN2 generator tasks.

Face Swapping

VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

1 code implementation8 Jun 2023 Junhyeok Lee, Hyeonuk Nam, Yong-Hwa Park

Different from TTS models which generate short pronunciation from phonemes and speaker identity, the category-to-sound problem requires generating diverse sounds just from a category index.

Speech Synthesis Variational Inference

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

2 code implementations24 Feb 2023 Junhyeok Lee, Wonbin Jung, Hyunjae Cho, Jaeyeon Kim, Jaehwan Kim

Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech.

Variational Inference

Direct Preference-based Policy Optimization without Reward Modeling

1 code implementation NeurIPS 2023 Gaon An, Junhyeok Lee, Xingdong Zuo, Norio Kosaka, Kyung-Min Kim, Hyun Oh Song

We apply our algorithm to offline RL tasks with actual human preference labels and show that our algorithm outperforms or is on par with the existing PbRL methods.

Contrastive Learning Offline RL +1

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

2 code implementations8 Nov 2022 Junhyeok Lee, Seungu Han, Hyunjae Cho, Wonbin Jung

Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis.

Generative Adversarial Network Speech Synthesis

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

no code implementations24 Jun 2022 Hyunjae Cho, Wonbin Jung, Junhyeok Lee, Sang Hoon Woo

By the difficulty of obtaining multilingual corpus for given speaker, training multilingual TTS model with monolingual corpora is unavoidable.

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

4 code implementations17 Jun 2022 Seungu Han, Junhyeok Lee

Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates.

Audio Super-Resolution Super-Resolution

Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization

1 code implementation17 Jun 2022 Deokjae Lee, Seungyong Moon, Junhyeok Lee, Hyun Oh Song

We focus on the problem of adversarial attacks against models on discrete sequential data in the black-box setting where the attacker aims to craft adversarial examples with limited query access to the victim model.

Bayesian Optimization

Talking Face Generation with Multilingual TTS

no code implementations CVPR 2022 Hyoung-Kyu Song, Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim

In this work, we propose a joint system combining a talking face generation system with a text-to-speech system that can generate multilingual talking face videos from only the text input.

Talking Face Generation Translation

Controllable and Interpretable Singing Voice Decomposition via Assem-VC

1 code implementation25 Oct 2021 Kang-wook Kim, Junhyeok Lee

We propose a singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC.

Voice Conversion

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

3 code implementations6 Apr 2021 Junhyeok Lee, Seungu Han

In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz.

Audio Super-Resolution Super-Resolution

Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques

1 code implementation2 Apr 2021 Kang-wook Kim, Seung-won Park, Junhyeok Lee, Myun-chul Joe

Recent works on voice conversion (VC) focus on preserving the rhythm and the intonation as well as the linguistic content.

Speech Synthesis Voice Conversion

Cannot find the paper you are looking for? You can Submit a new open access paper.