Search Results for author: Toshiyuki Kumakura

Found 5 papers, 1 papers with code

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

1 code implementation16 May 2022 Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji

In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE).

Quantization

Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end

no code implementations24 Jan 2022 Rem Hida, Masaki Hamada, Chie Kamada, Emiru Tsunoo, Toshiyuki Sekiya, Toshiyuki Kumakura

Although end-to-end text-to-speech (TTS) models can generate natural speech, challenges still remain when it comes to estimating sentence-level phonetic and prosodic information from raw text in Japanese TTS systems.

Morphological Analysis Polyphone disambiguation

Towards Online End-to-end Transformer Automatic Speech Recognition

no code implementations25 Oct 2019 Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe

In this paper, we extend it towards an entire online E2E ASR system by introducing an online decoding process inspired by monotonic chunkwise attention (MoChA) into the Transformer decoder.

Automatic Speech Recognition speech-recognition

Transformer ASR with Contextual Block Processing

no code implementations16 Oct 2019 Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe

In this paper, we propose a new block processing method for the Transformer encoder by introducing a context-aware inheritance mechanism.

Automatic Speech Recognition speech-recognition

End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

no code implementations17 May 2019 Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura

We convert a pretrained WFST to a trainable neural network and adapt the system to target environments/vocabulary by E2E joint training with an AM.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.