Search Results for author: Naoya Takahashi

Found 26 papers, 14 papers with code

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

1 code implementation13 May 2023 Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

We modify the target network, i. e., the network architecture of the original DNN-based MSS, by adding bridging paths for each output instrument to share their information.

Music Source Separation

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

1 code implementation7 May 2018 Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji

Deep neural networks have become an indispensable technique for audio source separation (ASS).

Ranked #17 on Music Source Separation on MUSDB18 (using extra training data)

Music Source Separation Sound Audio and Speech Processing

D3Net: Densely connected multidilated DenseNet for music source separation

1 code implementation5 Oct 2020 Naoya Takahashi, Yuki Mitsufuji

In this paper, we claim the importance of a rapid growth of a receptive field and a simultaneous modeling of multi-resolution data in a single convolution layer, and propose a novel CNN architecture called densely connected dilated DenseNet (D3Net).

Ranked #12 on Music Source Separation on MUSDB18 (using extra training data)

Music Source Separation

Densely connected multidilated convolutional networks for dense prediction tasks

1 code implementation21 Nov 2020 Naoya Takahashi, Yuki Mitsufuji

In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net).

Audio Source Separation Music Source Separation +1

Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks

1 code implementation CVPR 2021 Naoya Takahashi, Yuki Mitsufuji

In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net).

Audio Source Separation Semantic Segmentation

AENet: Learning Deep Audio Features for Video Analysis

1 code implementation3 Jan 2017 Naoya Takahashi, Michael Gygli, Luc van Gool

Instead, combining visual features with our AENet features, which can be computed efficiently on a GPU, leads to significant performance improvements on action recognition and video highlight detection.

Action Recognition Data Augmentation +4

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

2 code implementations4 Jun 2022 Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.

Sound Event Localization and Detection

End-to-end lyrics Recognition with Voice to Singing Style Transfer

1 code implementation17 Feb 2021 Sakya Basak, Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice.

Data Augmentation Language Modelling +2

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

1 code implementation14 Dec 2022 Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick

Further, videos in the wild often contain off-screen sounds and background noise that may hinder the model from learning the desired audio-textual correspondence.

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

1 code implementation NeurIPS 2023 Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.

Sound Event Localization and Detection

Improving Voice Separation by Incorporating End-to-end Speech Recognition

1 code implementation29 Nov 2019 Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji

Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

no code implementations15 Jun 2016 Naoya Takahashi, Tofigh Naghibi, Beat Pfister

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs.

speech-recognition Speech Recognition

Adversarial attacks on audio source separation

no code implementations7 Oct 2020 Naoya Takahashi, Shota Inoue, Yuki Mitsufuji

Despite the excellent performance of neural-network-based audio source separation methods and their wide range of applications, their robustness against intentional attacks has been largely neglected.

Adversarial Attack Audio Source Separation

Hierarchical disentangled representation learning for singing voice conversion

no code implementations18 Jan 2021 Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji

Conventional singing voice conversion (SVC) methods often suffer from operating in high-resolution audio owing to a high dimensionality of data.

Representation Learning Voice Conversion

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

no code implementations14 Oct 2022 Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji

Recent progress in deep generative models has improved the quality of neural vocoders in speech domain.

Robust One-Shot Singing Voice Conversion

no code implementations20 Oct 2022 Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji

We then propose a two-stage training method called Robustify that train the one-shot SVC model in the first stage on clean data to ensure high-quality conversion, and introduces enhancement modules to the encoders of the model in the second stage to enhance the feature extraction from distorted singing voices.

Voice Conversion

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

no code implementations21 Feb 2023 Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe

Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal.

Voice Conversion

Cross-modal Face- and Voice-style Transfer

no code implementations27 Feb 2023 Naoya Takahashi, Mayank K. Singh, Yuki Mitsufuji

Image-to-image translation and voice conversion enable the generation of a new facial image and voice while maintaining some of the semantics such as a pose in an image and linguistic content in audio, respectively.

Image-to-Image Translation Open-Ended Question Answering +3

Iteratively Improving Speech Recognition and Voice Conversion

no code implementations24 May 2023 Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki

Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.