Search Results for author: Tomoki Koriyama

Found 8 papers, 0 papers with code

Sampling-based speech parameter generation using moment-matching networks

no code implementations • 12 Apr 2017 • Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

To give synthetic speech natural inter-utterance variation, this paper builds DNN acoustic models that make it possible to randomly sample speech parameters.

Speech Synthesis

Paper
Add Code

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

no code implementations • 9 Feb 2019 • Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis.

Singing Voice Synthesis

Paper
Add Code

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

no code implementations • 22 Apr 2020 • Tomoki Koriyama, Hiroshi Saruwatari

This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling.

Speech Synthesis

Paper
Add Code

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus

no code implementations • LREC 2020 • Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari

In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis.

Speech Synthesis

Paper
Add Code

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

no code implementations • 7 Aug 2020 • Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari

We propose a framework for multi-speaker speech synthesis using deep Gaussian processes (DGPs); a DGP is a deep architecture of Bayesian kernel regressions and thus robust to overfitting.

Gaussian Processes Speech Synthesis +1

Paper
Add Code

Structured State Space Decoder for Speech Recognition and Synthesis

no code implementations • 31 Oct 2022 • Koichi Miyazaki, Masato Murata, Tomoki Koriyama

Automatic speech recognition (ASR) systems developed in recent years have shown promising results with self-attention models (e. g., Transformer and Conformer), which are replacing conventional recurrent neural networks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

no code implementations • 27 Feb 2023 • Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

We also leverage duration-aware pause insertion for more natural multi-speaker TTS.

Language Modelling

Paper
Add Code

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

no code implementations • 1 Feb 2024 • Dong Yang, Tomoki Koriyama, Yuki Saito

Developing Text-to-Speech (TTS) systems that can synthesize natural breath is essential for human-like voice agents but requires extensive manual annotation of breath positions in training data.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.