no code implementations • 1 Jun 2023 • Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji, Stefano Ermon
The emergence of various notions of ``consistency'' in diffusion models has garnered considerable attention and helped achieve improved sample quality, likelihood estimation, and accelerated sampling.
no code implementations • 18 May 2023 • Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji
Diffusion-based speech enhancement (SE) has been investigated recently, but its decoding is very time-consuming.
1 code implementation • 13 May 2023 • Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji
We modify the target network, i. e., the network architecture of the original DNN-based MSS, by adding bridging paths for each output instrument to share their information.
no code implementations • 10 May 2023 • Masato Hirano, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji
We experimentally show that our refiner can provide a clearer harmonic structure of speech and improves the reference-free metric of perceptual quality for arbitrary preceding model architectures.
1 code implementation • 3 May 2023 • Silin Gao, Beatriz Borges, Soyoung Oh, Deniz Bayazit, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut
They must also learn to maintain consistent speaker personas for themselves throughout the narrative, so that their counterparts feel involved in a realistic conversation or story.
no code implementations • 27 Feb 2023 • Naoya Takahashi, Mayank K. Singh, Yuki Mitsufuji
Image-to-image translation and voice conversion enable the generation of a new facial image and voice while maintaining some of the semantics such as a pose in an image and linguistic content in audio, respectively.
no code implementations • 30 Jan 2023 • Yuhta Takida, Masaaki Imaizumi, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji
Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives.
no code implementations • 30 Jan 2023 • Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon
Pre-trained diffusion models have been successfully used as priors in a variety of linear inverse problems, where the goal is to reconstruct a signal from noisy linear measurements.
1 code implementation • 14 Dec 2022 • Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick
Further, videos in the wild often contain off-screen sounds and background noise that may hinder the model from learning the desired audio-textual correspondence.
no code implementations • 8 Nov 2022 • Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui, Yuki Mitsufuji
Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations.
1 code implementation • 4 Nov 2022 • Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, Yuki Mitsufuji
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song.
no code implementations • 27 Oct 2022 • Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji
Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs.
1 code implementation • 23 Oct 2022 • Silin Gao, Jena D. Hwang, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut
Understanding rich narratives, such as dialogues and stories, often requires natural language processing systems to access relevant knowledge from commonsense knowledge graphs.
no code implementations • 20 Oct 2022 • Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji
In this work, we propose robust one-shot SVC (ROSVC) that performs any-to-any SVC robustly even on such distorted singing voices using less than 10s of a reference voice.
no code implementations • 14 Oct 2022 • Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji
Recent progress in deep generative models has improved the quality of neural vocoders in speech domain.
no code implementations • 11 Oct 2022 • Kin Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji
In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT).
no code implementations • 9 Oct 2022 • Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon
Score-based generative models learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise.
1 code implementation • 24 Aug 2022 • Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Stefan Uhlich, Chihiro Nagashima, Yuki Mitsufuji
Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge (e. g., a mixing engineer).
2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen
Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.
Ranked #1 on
Sound Event Localization and Detection
on STARSS22
1 code implementation • 16 May 2022 • Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji
In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE).
no code implementations • 3 Feb 2022 • Johannes Imort, Giorgio Fabbro, Marco A. Martínez Ramírez, Stefan Uhlich, Yuichiro Koyama, Yuki Mitsufuji
Given the recent advances in music source separation and automatic mixing, removing audio effects in music tracks is a meaningful step toward developing an automated remixing system.
2 code implementations • 14 Oct 2021 • Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Naoya Takahashi, Emiru Tsunoo, Yuki Mitsufuji
The multi- ACCDOA format (a class- and track-wise output format) enables the model to solve the cases with overlaps from the same class.
no code implementations • 13 Oct 2021 • Bo-Yu Chen, Wei-Han Hsu, Wei-Hsiang Liao, Marco A. Martínez Ramírez, Yuki Mitsufuji, Yi-Hsuan Yang
A central task of a Disc Jockey (DJ) is to create a mixset of mu-sic with seamless transitions between adjacent tracks.
1 code implementation • 12 Oct 2021 • Ricardo Falcon-Perez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji
Data augmentation methods have shown great importance in diverse supervised learning problems where labeled data is scarce or costly to obtain.
1 code implementation • 31 Aug 2021 • Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk
The main differences compared with the past challenges are 1) the competition is designed to more easily allow machine learning practitioners from other disciplines to participate, 2) evaluation is done on a hidden test set created by music professionals dedicated exclusively to the challenge to assure the transparency of the challenge, i. e., the test set is not accessible from anyone except the challenge organizers, and 3) the dataset provides a wider range of music genres and involved a greater number of mixing engineers.
no code implementations • 21 Jun 2021 • Kazuki Shimada, Naoya Takahashi, Yuichiro Koyama, Shusuke Takahashi, Emiru Tsunoo, Masafumi Takahashi, Yuki Mitsufuji
This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference.
1 code implementation • CVPR 2021 • Naoya Takahashi, Yuki Mitsufuji
In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net).
no code implementations • 26 May 2021 • Koichi Saito, Stefan Uhlich, Giorgio Fabbro, Yuki Mitsufuji
Furthermore, we propose a noise augmentation scheme for mixture-invariant training (MixIT), which allows using it also in such scenarios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 17 Feb 2021 • Yuhta Takida, Wei-Hsiang Liao, Chieh-Hsin Lai, Toshimitsu Uesaka, Shusuke Takahashi, Yuki Mitsufuji
Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon in which the learned latent space becomes uninformative.
no code implementations • 18 Jan 2021 • Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji
Conventional singing voice conversion (SVC) methods often suffer from operating in high-resolution audio owing to a high dimensionality of data.
no code implementations • 1 Jan 2021 • Yuhta Takida, Wei-Hsiang Liao, Toshimitsu Uesaka, Shusuke Takahashi, Yuki Mitsufuji
Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon that the learned latent space becomes uninformative.
1 code implementation • 21 Nov 2020 • Naoya Takahashi, Yuki Mitsufuji
In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net).
Ranked #43 on
Semantic Segmentation
on Cityscapes test
2 code implementations • 29 Oct 2020 • Kazuki Shimada, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji
Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target.
5 code implementations • 8 Oct 2020 • Ryosuke Sawata, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji
This paper proposes several improvements for music separation with deep neural networks (DNNs), namely a multi-domain loss (MDL) and two combination schemes.
Ranked #19 on
Music Source Separation
on MUSDB18
no code implementations • 7 Oct 2020 • Naoya Takahashi, Shota Inoue, Yuki Mitsufuji
Despite the excellent performance of neural-network-based audio source separation methods and their wide range of applications, their robustness against intentional attacks has been largely neglected.
1 code implementation • 5 Oct 2020 • Naoya Takahashi, Yuki Mitsufuji
In this paper, we claim the importance of a rapid growth of a receptive field and a simultaneous modeling of multi-resolution data in a single convolution layer, and propose a novel CNN architecture called densely connected dilated DenseNet (D3Net).
Ranked #11 on
Music Source Separation
on MUSDB18
(using extra training data)
1 code implementation • 29 Nov 2019 • Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji
Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 7 Jul 2018 • Joachim Muth, Stefan Uhlich, Nathanael Perraudin, Thomas Kemp, Fabien Cardinaux, Yuki Mitsufuji
Music source separation with deep neural networks typically relies only on amplitude features.
1 code implementation • 7 May 2018 • Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji
Deep neural networks have become an indispensable technique for audio source separation (ASS).
Ranked #15 on
Music Source Separation
on MUSDB18
(using extra training data)
Music Source Separation
Sound
Audio and Speech Processing
3 code implementations • 29 Jun 2017 • Naoya Takahashi, Yuki Mitsufuji
This paper deals with the problem of audio source separation.