no code implementations • CAI (COLING) 2022 • Zhuo Gong, Daisuke Saito, Sheng Li, Hisashi Kawai, Nobuaki Minematsu
The experiments show that we can enhance an ASR E2E model based on encoder-decoder architecture by pre-training the decoder with text data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 Sep 2023 • Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, Hiroshi Saruwatari
In this study, we investigate whether speech symbols, learned through deep learning, follow Zipf's law, akin to natural language symbols.
no code implementations • 12 Oct 2019 • Yasuhiro Watanabe, Hironori Washizaki, Kazunori Sakamoto, Daisuke Saito, Kiyoshi Honda, Naohiko Tsuda, Yoshiaki Fukazawa, Nobukazu Yoshioka
Previous machine learning (ML) system development research suggests that emerging software quality attributes are a concern due to the probabilistic behavior of ML systems.
no code implementations • 31 Jul 2018 • Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu
In order to reduce the mismatched characteristics between natural and generated acoustic features, we propose frameworks that incorporate either a conditional generative adversarial network (GAN) or its variant, Wasserstein GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that uses the WaveNet vocoder.
no code implementations • 23 Apr 2018 • Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling
As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.
no code implementations • 12 Apr 2018 • Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.