Search Results for author: Kohei Yatabe

Found 21 papers, 3 papers with code

Sampling-Frequency-Independent Universal Sound Separation

no code implementations22 Sep 2023 Tomohiko Nakamura, Kohei Yatabe

The USS aims at separating arbitrary sources of different types and can be the key technique to realize a source separator that can be universally used as a preprocessor for any downstream tasks.

Versatile Time-Frequency Representations Realized by Convex Penalty on Magnitude Spectrogram

no code implementations3 Aug 2023 Keidai Arai, Koki Yamada, Kohei Yatabe

Sparse time-frequency (T-F) representations have been an important research topic for more than several decades.

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations30 May 2023 Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

no code implementations3 Mar 2023 Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

no code implementations3 Oct 2022 Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani

The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, respectively.


SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

no code implementations31 Mar 2022 Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani

Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features.

Denoising Speech Enhancement

APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization

no code implementations16 Feb 2022 Tomoro Tanaka, Kohei Yatabe, Masahiro Yasuda, Yasuhiro Oikawa

Still, they cannot perform well if the training data have mismatches and/or constraints in the time domain are not imposed.

Audio declipping

Design of Tight Minimum-Sidelobe Windows by Riemannian Newton's Method

no code implementations2 Nov 2021 Daichi Kitahara, Kohei Yatabe

The short-time Fourier transform (STFT), or the discrete Gabor transform (DGT), has been extensively used in signal analysis and processing.

Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method

1 code implementation10 May 2021 Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi Saruwatari

Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals.

Audio Source Separation Music Source Separation

Sparse time-frequency representation via atomic norm minimization

no code implementations7 May 2021 Tsubasa Kusano, Kohei Yatabe, Yasuhiro Oikawa

In this paper, we propose a method of estimating a sparse T-F representation using atomic norm.

Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech

no code implementations21 Jan 2021 Takuya Fujimura, Yuma Koizumi, Kohei Yatabe, Ryoichi Miyazaki

This requirement currently restricts the amount of training data for speech enhancement to less than 1/1000 of that of speech recognition which does not need clean signals.

Speech Enhancement speech-recognition +1

Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling

no code implementations28 Jul 2020 Yoshiki Masuyama, Yoshiaki Bando, Kohei Yatabe, Yoko Sasaki, Masaki Onishi, Yasuhiro Oikawa

By incorporating with the spatial information in multichannel audio signals, our method trains deep neural networks (DNNs) to distinguish multiple sound source objects.

Self-Supervised Learning

Gamma Boltzmann Machine for Simultaneously Modeling Linear- and Log-amplitude Spectra

no code implementations24 Jun 2020 Toru Nakashika, Kohei Yatabe

Its conditional distribution of the observable data is given by the gamma distribution, and thus the proposed RBM can naturally handle the data represented by positive numbers as the amplitude spectra.

Consistent ICA: Determined BSS meets spectrogram consistency

no code implementations20 May 2020 Kohei Yatabe

Multichannel audio blind source separation (BSS) in the determined situation (the number of microphones is equal to that of the sources), or determined BSS, is performed by multichannel linear filtering in the time-frequency domain to handle the convolutive mixing process.

Determined BSS based on time-frequency masking and its application to harmonic vector analysis

no code implementations29 Apr 2020 Kohei Yatabe, Daichi Kitamura

This paper proposes harmonic vector analysis (HVA) based on a general algorithmic framework of audio blind source separation (BSS) that is also presented in this paper.

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

no code implementations14 Feb 2020 Masaki Kawanaka, Yuma Koizumi, Ryoichi Miyazaki, Kohei Yatabe

For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality).

Speech Enhancement

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

no code implementations14 Feb 2020 Yuma Koizumi, Kohei Yatabe, Marc Delcroix, Yoshiki Masuyama, Daiki Takeuchi

This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance.

Multi-Task Learning Speaker Identification +3

Phase reconstruction based on recurrent phase unwrapping with deep neural networks

no code implementations14 Feb 2020 Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

In the proposed method, DNNs estimate phase derivatives instead of phase itself, which allows us to avoid the sensitivity problem.

Invertible DNN-based nonlinear time-frequency transform for speech enhancement

1 code implementation25 Nov 2019 Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand.

Audio and Speech Processing Sound

Deep Griffin-Lim Iteration

no code implementations10 Mar 2019 Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN).

Cannot find the paper you are looking for? You can Submit a new open access paper.