Search Results for author: Wei-Hsiang Liao

Found 14 papers, 4 papers with code

MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage

no code implementations • 15 Mar 2024 • Hao Hao Tan, Kin Wai Cheuk, Taemin Cho, Wei-Hsiang Liao, Yuki Mitsufuji

This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model.

Music Transcription

Paper
Add Code

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

no code implementations • 9 Feb 2024 • Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco Martínez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged.

Music Generation Text-to-Music Generation

Paper
Add Code

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

no code implementations • 31 Dec 2023 • Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki Mitsufuji

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations.

Quantization Representation Learning

Paper
Add Code

Manifold Preserving Guided Diffusion

no code implementations • 28 Nov 2023 • Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon

Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.

Conditional Image Generation

Paper
Add Code

On the Language Encoder of Contrastive Cross-modal Models

no code implementations • 20 Oct 2023 • Mengjie Zhao, Junya Ono, Zhi Zhong, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Takashi Shibuya, Hiromi Wakaki, Yuki Mitsufuji

Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks.

Sentence Sentence Embedding +1

Paper
Add Code

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

1 code implementation • 1 Oct 2023 • Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon

Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed.

Ranked #2 on Image Generation on CIFAR-10

Denoising Image Generation

162

Paper
Code

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

no code implementations • 27 Sep 2023 • Frank Cwitkowitz, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama, Wei-Hsiang Liao, Yuki Mitsufuji

Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues.

Music Transcription

Paper
Add Code

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

no code implementations • 13 Sep 2023 • Carlos Hernandez-Olivan, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramirez, Wei-Hsiang Liao, Yuki Mitsufuji

Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation.

Bandwidth Extension

Paper
Add Code

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

1 code implementation • 10 Jul 2023 • Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content.

Music Transcription

Paper
Code

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

1 code implementation • 4 Nov 2022 • Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, Yuki Mitsufuji

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song.

Contrastive Learning Disentanglement +2

146

Paper
Code

Automatic music mixing with deep learning and out-of-domain data

1 code implementation • 24 Aug 2022 • Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Stefan Uhlich, Chihiro Nagashima, Yuki Mitsufuji

Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge (e. g., a mixing engineer).

Paper
Code

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks

no code implementations • 13 Oct 2021 • Bo-Yu Chen, Wei-Han Hsu, Wei-Hsiang Liao, Marco A. Martínez Ramírez, Yuki Mitsufuji, Yi-Hsuan Yang

A central task of a Disc Jockey (DJ) is to create a mixset of mu-sic with seamless transitions between adjacent tracks.

Generative Adversarial Network

Paper
Add Code

Preventing Oversmoothing in VAE via Generalized Variance Parameterization

no code implementations • 17 Feb 2021 • Yuhta Takida, Wei-Hsiang Liao, Chieh-Hsin Lai, Toshimitsu Uesaka, Shusuke Takahashi, Yuki Mitsufuji

Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon in which the learned latent space becomes uninformative.

Paper
Add Code

AR-ELBO: Preventing Posterior Collapse Induced by Oversmoothing in Gaussian VAE

no code implementations • 1 Jan 2021 • Yuhta Takida, Wei-Hsiang Liao, Toshimitsu Uesaka, Shusuke Takahashi, Yuki Mitsufuji

Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon that the learned latent space becomes uninformative.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.