Search Results for author: Jaehun Kim

Found 13 papers, 4 papers with code

FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

2 code implementations18 Jan 2024 Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung

The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad.

Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search

no code implementations17 Jan 2024 Matthew C. McCallum, Florian Henkel, Jaehun Kim, Samuel E. Sandberg, Matthew E. P. Davies

We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre.

Data Augmentation Retrieval +1

On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations

no code implementations17 Jan 2024 Matthew C. McCallum, Matthew E. P. Davies, Florian Henkel, Jaehun Kim, Samuel E. Sandberg

Similarly, we show that the optimal selection of data augmentation strategies for contrastive learning of music audio embeddings is dependent on the downstream task, highlighting this as an important embedding design decision.

Contrastive Learning Data Augmentation

Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

no code implementations30 Oct 2023 Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung

For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism.

Speech Separation

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos

no code implementations29 Aug 2023 Ji-Hoon Kim, Jaehun Kim, Joon Son Chung

In this paper, we propose a novel lip-to-speech system that significantly improves the generation quality by alleviating the one-to-many mapping problem from multiple perspectives.

Contrastive Learning for Cross-modal Artist Retrieval

no code implementations12 Aug 2023 Andres Ferraro, Jaehun Kim, Sergio Oramas, Andreas Ehmann, Fabien Gouyon

We demonstrate our method successfully combines complementary information from diverse modalities, and is more robust to missing modality data (i. e., it better handles the retrieval of artists with different modality embeddings than the query artist's).

Contrastive Learning Retrieval

Generative Autoregressive Networks for 3D Dancing Move Synthesis from Music

no code implementations11 Nov 2019 Hyemin Ahn, Jaehun Kim, Kihyun Kim, Songhwai Oh

The trained dance pose generator, which is a generative autoregressive model, is able to synthesize a dance sequence longer than 5, 000 pose frames.

Are Nearby Neighbors Relatives?: Testing Deep Music Embeddings

no code implementations15 Apr 2019 Jaehun Kim, Julián Urbano, Cynthia C. S. Liem, Alan Hanjalic

The underlying assumption is that in case a deep representation is to be trusted, distance consistency between known related points should be maintained both in the input audio space and corresponding latent deep space.

Transfer Learning of Artist Group Factors to Musical Genre Classification

1 code implementation5 May 2018 Jaehun Kim, Minz Won, Xavier Serra, Cynthia C. S. Liem

The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy.

Classification General Classification +2

One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies

1 code implementation12 Feb 2018 Jaehun Kim, Julián Urbano, Cynthia C. S. Liem, Alan Hanjalic

In this paper, we present the results of our investigation of what are the most important factors to generate deep representations for the data and learning tasks in the music domain.

Information Retrieval Music Information Retrieval +3

Deep convolutional neural networks for predominant instrument recognition in polyphonic music

1 code implementation31 May 2016 Yoonchang Han, Jaehun Kim, Kyogu Lee

We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length.

Information Retrieval Instrument Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.