1 code implementation • 21 Nov 2022 • Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis.
1 code implementation • 29 Oct 2018 • Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi
Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.
1 code implementation • 29 Oct 2018 • Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly.
1 code implementation • 10 Nov 2019 • Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi
Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones.
Sound Audio and Speech Processing
no code implementations • 7 Apr 2018 • Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi
Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches.
no code implementations • 27 Mar 2018 • Toru Nakashika, Shinji Takaki, Junichi Yamagishi
In contrast, the proposed feature extractor using the CRBM directly encodes the complex spectra (or another complex-valued representation of the complex spectra) into binary-valued latent features (hidden units).
no code implementations • 17 Jun 2015 • Zhenzhou Wu, Shinji Takaki, Junichi Yamagishi
This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis.
no code implementations • 31 Jul 2018 • Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu
In order to reduce the mismatched characteristics between natural and generated acoustic features, we propose frameworks that incorporate either a conditional generative adversarial network (GAN) or its variant, Wasserstein GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that uses the WaveNet vocoder.
no code implementations • 29 Oct 2018 • Xin Wang, Shinji Takaki, Junichi Yamagishi
Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure.
no code implementations • 29 Mar 2019 • Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model.
no code implementations • 27 Apr 2019 • Xin Wang, Shinji Takaki, Junichi Yamagishi
Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation.
no code implementations • 24 Oct 2019 • Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices.
no code implementations • 15 Feb 2021 • Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
We also show that the speech waveforms with a pitch outside of the training data range can be generated with more naturalness.
no code implementations • 31 Aug 2021 • Yoshihiko Nankaku, Kenta Sumiya, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Keiichi Tokuda
This paper proposes a novel Sequence-to-Sequence (Seq2Seq) model integrating the structure of Hidden Semi-Markov Models (HSMMs) into its attention mechanism.