no code implementations • 22 Sep 2023 • Tomohiko Nakamura, Kohei Yatabe
The USS aims at separating arbitrary sources of different types and can be the key technique to realize a source separator that can be universally used as a preprocessor for any downstream tasks.
no code implementations • 3 Aug 2023 • Keidai Arai, Koki Yamada, Kohei Yatabe
Sparse time-frequency (T-F) representations have been an important research topic for more than several decades.
no code implementations • 30 May 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna
The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.
no code implementations • 3 Mar 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani
Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.
no code implementations • 3 Oct 2022 • Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani
The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, respectively.
no code implementations • 31 Mar 2022 • Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani
Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features.
1 code implementation • 17 Feb 2022 • Kento Nagatomo, Masahiro Yasuda, Kohei Yatabe, Shoichiro Saito, Yasuhiro Oikawa
Sound event localization and detection (SELD) is a combined task of identifying the sound event and its direction.
no code implementations • 16 Feb 2022 • Tomoro Tanaka, Kohei Yatabe, Masahiro Yasuda, Yasuhiro Oikawa
Still, they cannot perform well if the training data have mismatches and/or constraints in the time domain are not imposed.
no code implementations • 2 Nov 2021 • Daichi Kitahara, Kohei Yatabe
The short-time Fourier transform (STFT), or the discrete Gabor transform (DGT), has been extensively used in signal analysis and processing.
1 code implementation • 10 May 2021 • Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi Saruwatari
Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals.
no code implementations • 7 May 2021 • Tsubasa Kusano, Kohei Yatabe, Yasuhiro Oikawa
In this paper, we propose a method of estimating a sparse T-F representation using atomic norm.
no code implementations • 21 Jan 2021 • Takuya Fujimura, Yuma Koizumi, Kohei Yatabe, Ryoichi Miyazaki
This requirement currently restricts the amount of training data for speech enhancement to less than 1/1000 of that of speech recognition which does not need clean signals.
no code implementations • 28 Jul 2020 • Yoshiki Masuyama, Yoshiaki Bando, Kohei Yatabe, Yoko Sasaki, Masaki Onishi, Yasuhiro Oikawa
By incorporating with the spatial information in multichannel audio signals, our method trains deep neural networks (DNNs) to distinguish multiple sound source objects.
no code implementations • 24 Jun 2020 • Toru Nakashika, Kohei Yatabe
Its conditional distribution of the observable data is given by the gamma distribution, and thus the proposed RBM can naturally handle the data represented by positive numbers as the amplitude spectra.
no code implementations • 20 May 2020 • Kohei Yatabe
Multichannel audio blind source separation (BSS) in the determined situation (the number of microphones is equal to that of the sources), or determined BSS, is performed by multichannel linear filtering in the time-frequency domain to handle the convolutive mixing process.
no code implementations • 29 Apr 2020 • Kohei Yatabe, Daichi Kitamura
This paper proposes harmonic vector analysis (HVA) based on a general algorithmic framework of audio blind source separation (BSS) that is also presented in this paper.
no code implementations • 14 Feb 2020 • Masaki Kawanaka, Yuma Koizumi, Ryoichi Miyazaki, Kohei Yatabe
For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality).
no code implementations • 14 Feb 2020 • Yuma Koizumi, Kohei Yatabe, Marc Delcroix, Yoshiki Masuyama, Daiki Takeuchi
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance.
no code implementations • 14 Feb 2020 • Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada
In the proposed method, DNNs estimate phase derivatives instead of phase itself, which allows us to avoid the sensitivity problem.
1 code implementation • 25 Nov 2019 • Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada
Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand.
Audio and Speech Processing Sound
no code implementations • 10 Mar 2019 • Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada
This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN).