Search Results for author: Shinji Takaki

Found 14 papers, 4 papers with code

Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

1 code implementation • 21 Nov 2022 • Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis.

Speech Synthesis

148

Paper
Code

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

1 code implementation • 29 Oct 2018 • Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

Speech Synthesis Text-To-Speech Synthesis

113

Paper
Code

STFT spectral loss for training a neural speech waveform model

1 code implementation • 29 Oct 2018 • Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi

This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly.

Paper
Code

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

1 code implementation • 10 Nov 2019 • Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones.

Sound Audio and Speech Processing

Paper
Code

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

no code implementations • 7 Apr 2018 • Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches.

Speech Synthesis

Paper
Add Code

Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra

no code implementations • 27 Mar 2018 • Toru Nakashika, Shinji Takaki, Junichi Yamagishi

In contrast, the proposed feature extractor using the CRBM directly encodes the complex spectra (or another complex-valued representation of the complex spectra) into binary-valued latent features (hidden units).

Paper
Add Code

Deep Denoising Auto-encoder for Statistical Speech Synthesis

no code implementations • 17 Jun 2015 • Zhenzhou Wu, Shinji Takaki, Junichi Yamagishi

This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis.

Denoising Speech Synthesis

Paper
Add Code

Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

no code implementations • 31 Jul 2018 • Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu

In order to reduce the mismatched characteristics between natural and generated acoustic features, we propose frameworks that incorporate either a conditional generative adversarial network (GAN) or its variant, Wasserstein GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that uses the WaveNet vocoder.

Generative Adversarial Network Speech Synthesis +1

Paper
Add Code

Neural source-filter-based waveform model for statistical parametric speech synthesis

no code implementations • 29 Oct 2018 • Xin Wang, Shinji Takaki, Junichi Yamagishi

Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure.

Speech Synthesis

Paper
Add Code

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform

no code implementations • 29 Mar 2019 • Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi

Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model.

Paper
Add Code

Neural source-filter waveform models for statistical parametric speech synthesis

no code implementations • 27 Apr 2019 • Xin Wang, Shinji Takaki, Junichi Yamagishi

Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation.

Speech Synthesis

Paper
Add Code

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

no code implementations • 24 Oct 2019 • Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices.

Singing Voice Synthesis

Paper
Add Code

PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components

no code implementations • 15 Feb 2021 • Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

We also show that the speech waveforms with a pitch outside of the training data range can be generated with more naturalness.

Paper
Add Code

Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism

no code implementations • 31 Aug 2021 • Yoshihiko Nankaku, Kenta Sumiya, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Keiichi Tokuda

This paper proposes a novel Sequence-to-Sequence (Seq2Seq) model integrating the structure of Hidden Semi-Markov Models (HSMMs) into its attention mechanism.

Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.