Search Results for author: Junhyeok Lee

Found 21 papers, 12 papers with code

Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

no code implementations23 Nov 2024 Junhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul Ye

PIRTA mitigates the need for learning cross-modal mapping, which poses difficulty in image-to-text generation, by casting the cross-modal mapping problem as an in-domain retrieval of similar DWI images that have paired ground-truth text radiology reports.

Cross-Modal Retrieval Image to text +4

Super Monotonic Alignment Search

1 code implementation12 Sep 2024 Junhyeok Lee, Hyeongju Kim

Monotonic alignment search (MAS), introduced by Glow-TTS, is one of the most popular algorithm in TTS to estimate unknown alignments between text and speech.

DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance

no code implementations26 Aug 2024 Jinhyeok Yang, Junhyeok Lee, Hyeong-Seok Choi, Seunghun Ji, Hyeongju Kim, Juheon Lee

Text-to-Speech (TTS) models have advanced significantly, aiming to accurately replicate human speech's diversity, including unique speaker identities and linguistic nuances.

Diversity text-to-speech +1

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

no code implementations10 Jun 2024 Hyunjae Cho, Junhyeok Lee, Wonbin Jung

Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality.

Speech Synthesis

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

1 code implementation8 Jun 2024 Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels.

Event Detection Sound Event Detection

REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity

no code implementations27 May 2024 Seungwon Seo, SeongRae Noh, Junhyeok Lee, Soobin Lim, Won Hee Lee, Hyeongyeop Kang

We address the challenge of multi-agent cooperation, where agents achieve a common goal by cooperating with decentralized agents under complex partial observations.

Management

LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping

1 code implementation28 Feb 2024 Changho Choi, Minho Kim, Junhyeok Lee, Hyoung-Kyu Song, Younggeun Kim, Seungryong Kim

We show that our framework is applicable to other generators such as StyleNeRF, paving a way to 3D-aware face swapping and is also compatible with other downstream StyleGAN2 generator tasks.

Face Swapping

VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

1 code implementation8 Jun 2023 Junhyeok Lee, Hyeonuk Nam, Yong-Hwa Park

Different from TTS models which generate short pronunciation from phonemes and speaker identity, the category-to-sound problem requires generating diverse sounds just from a category index.

Speech Synthesis text-to-speech +2

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

2 code implementations24 Feb 2023 Junhyeok Lee, Wonbin Jung, Hyunjae Cho, Jaeyeon Kim, Jaehwan Kim

Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech.

Decoder text-to-speech +2

Direct Preference-based Policy Optimization without Reward Modeling

2 code implementations NeurIPS 2023 Gaon An, Junhyeok Lee, Xingdong Zuo, Norio Kosaka, Kyung-Min Kim, Hyun Oh Song

We apply our algorithm to offline RL tasks with actual human preference labels and show that our algorithm outperforms or is on par with the existing PbRL methods.

Contrastive Learning Offline RL +2

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

2 code implementations8 Nov 2022 Junhyeok Lee, Seungu Han, Hyunjae Cho, Wonbin Jung

Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis.

Generative Adversarial Network Speech Synthesis

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

no code implementations24 Jun 2022 Hyunjae Cho, Wonbin Jung, Junhyeok Lee, Sang Hoon Woo

By the difficulty of obtaining multilingual corpus for given speaker, training multilingual TTS model with monolingual corpora is unavoidable.

Rhythm text-to-speech +1

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

5 code implementations17 Jun 2022 Seungu Han, Junhyeok Lee

Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates.

Audio Super-Resolution Super-Resolution

Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization

1 code implementation17 Jun 2022 Deokjae Lee, Seungyong Moon, Junhyeok Lee, Hyun Oh Song

We focus on the problem of adversarial attacks against models on discrete sequential data in the black-box setting where the attacker aims to craft adversarial examples with limited query access to the victim model.

Bayesian Optimization

Talking Face Generation with Multilingual TTS

no code implementations CVPR 2022 Hyoung-Kyu Song, Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim

In this work, we propose a joint system combining a talking face generation system with a text-to-speech system that can generate multilingual talking face videos from only the text input.

Talking Face Generation text-to-speech +2

Controllable and Interpretable Singing Voice Decomposition via Assem-VC

1 code implementation25 Oct 2021 Kang-wook Kim, Junhyeok Lee

We propose a singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC.

Voice Conversion

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

3 code implementations6 Apr 2021 Junhyeok Lee, Seungu Han

In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz.

Audio Super-Resolution Super-Resolution

Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques

1 code implementation2 Apr 2021 Kang-wook Kim, Seung-won Park, Junhyeok Lee, Myun-chul Joe

Recent works on voice conversion (VC) focus on preserving the rhythm and the intonation as well as the linguistic content.

Decoder Rhythm +2

Cannot find the paper you are looking for? You can Submit a new open access paper.