no code implementations • 2 Feb 2024 • Jaeyeon Kim, Injune Hwang, Kyogu Lee
We propose a framework to learn semantics from raw audio signals using two types of representations, encoding contextual and phonetic information respectively.
1 code implementation • 31 Jan 2024 • Jaeyeon Kim, JaeYoon Jung, Jinjoo Lee, Sang Hoon Woo
We also introduce a new training objective called masked codec modeling that improves acoustic awareness of the pretrained language model.
Ranked #1 on Audio captioning on AudioCaps
1 code implementation • 20 Sep 2023 • Ka Chun Shum, Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung
Specifically, to insert a new foreground object represented by a set of multi-view images into a background radiance field, we use a text-to-image diffusion model to learn and generate combined images that fuse the object of interest into the given background across views.
2 code implementations • 24 Feb 2023 • Junhyeok Lee, Wonbin Jung, Hyunjae Cho, Jaeyeon Kim, Jaehwan Kim
Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech.
no code implementations • 16 Nov 2022 • Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung
In this paper, we propose a new method for mapping a 3D point cloud to the latent space of a 3D generative adversarial network.
no code implementations • ICCV 2021 • Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung
With recent developments of convolutional neural networks, deep learning for 3D point clouds has shown significant progress in various 3D scene understanding tasks, e. g., object recognition, semantic segmentation.