Search Results for author: Naoki Murata

Found 20 papers, 8 papers with code

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

no code implementations4 Jun 2024 Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Shusuke Takahashi, Yuki Mitsufuji

For high-quality and fast generation, we employ a variational autoencoder and latent diffusion model, and improve the performance with adversarial training.

Motion Synthesis

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

1 code implementation28 May 2024 Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Recent advances in text-to-music editing, which employ text queries to modify music (e. g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation.

Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information

no code implementations30 Apr 2024 Toshimitsu Uesaka, Taiji Suzuki, Yuhta Takida, Chieh-Hsin Lai, Naoki Murata, Yuki Mitsufuji

Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications.

Classification Contrastive Learning +2

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

no code implementations28 Mar 2024 Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts.

In-Context Learning Language Modelling +3

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

1 code implementation9 Feb 2024 Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged.

Music Generation Text-to-Music Generation

Manifold Preserving Guided Diffusion

no code implementations28 Nov 2023 Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon

Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.

Conditional Image Generation

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

1 code implementation1 Oct 2023 Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon

Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed.

Denoising Image Generation

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization

no code implementations1 Jun 2023 Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji, Stefano Ermon

The emergence of various notions of ``consistency'' in diffusion models has garnered considerable attention and helped achieve improved sample quality, likelihood estimation, and accelerated sampling.

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

1 code implementation30 Jan 2023 Yuhta Takida, Masaaki Imaizumi, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji

Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives.

Image Generation

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

1 code implementation30 Jan 2023 Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon

Pre-trained diffusion models have been successfully used as priors in a variety of linear inverse problems, where the goal is to reconstruct a signal from noisy linear measurements.

Blind Image Deblurring Denoising +1

Unsupervised vocal dereverberation with diffusion-based generative models

no code implementations8 Nov 2022 Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui, Yuki Mitsufuji

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations.

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

1 code implementation27 Oct 2022 Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs.

Denoising Speech Enhancement

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

1 code implementation9 Oct 2022 Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon

Score-based generative models (SGMs) learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise.


SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

1 code implementation16 May 2022 Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji

In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE).


Cannot find the paper you are looking for? You can Submit a new open access paper.