Search Results for author: Tomoki Koriyama

Found 8 papers, 0 papers with code

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

no code implementations • 1 Feb 2024 • Dong Yang, Tomoki Koriyama, Yuki Saito

Developing Text-to-Speech (TTS) systems that can synthesize natural breath is essential for human-like voice agents but requires extensive manual annotation of breath positions in training data.

Paper
Add Code

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

no code implementations • 27 Feb 2023 • Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

We also leverage duration-aware pause insertion for more natural multi-speaker TTS.

Language Modelling

Paper
Add Code

Structured State Space Decoder for Speech Recognition and Synthesis

no code implementations • 31 Oct 2022 • Koichi Miyazaki, Masato Murata, Tomoki Koriyama

Automatic speech recognition (ASR) systems developed in recent years have shown promising results with self-attention models (e. g., Transformer and Conformer), which are replacing conventional recurrent neural networks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

no code implementations • 7 Aug 2020 • Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari

We propose a framework for multi-speaker speech synthesis using deep Gaussian processes (DGPs); a DGP is a deep architecture of Bayesian kernel regressions and thus robust to overfitting.

Gaussian Processes Speech Synthesis +1

Paper
Add Code

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus

no code implementations • LREC 2020 • Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari

In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis.

Speech Synthesis

Paper
Add Code

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

no code implementations • 22 Apr 2020 • Tomoki Koriyama, Hiroshi Saruwatari

This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling.

Speech Synthesis

Paper
Add Code

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

no code implementations • 9 Feb 2019 • Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis.

Singing Voice Synthesis

Paper
Add Code

Sampling-based speech parameter generation using moment-matching networks

no code implementations • 12 Apr 2017 • Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

To give synthetic speech natural inter-utterance variation, this paper builds DNN acoustic models that make it possible to randomly sample speech parameters.

Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.