Search Results for author: Yusuke Ijima

Found 10 papers, 1 papers with code

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

no code implementations11 Feb 2024 Kenichi Fujita, Atsushi Ando, Yusuke Ijima

This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.

Speaker Identification Speech Synthesis

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

no code implementations31 Jan 2024 Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima

Our analysis unveils that 1) the capacity to represent content information is somewhat unrelated to enhanced speaker representation, 2) specific layers of speech SSL models would be partly specialized in capturing linguistic information, and 3) speaker SSL models tend to disregard linguistic information but exhibit more sophisticated speaker representation.

Self-Supervised Learning

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

no code implementations10 Jan 2024 Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima

The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately.

Self-Supervised Learning Speech Enhancement +2

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

no code implementations28 Nov 2023 Kazuki Yamauchi, Yusuke Ijima, Yuki Saito

The experimental results demonstrate that our StyleCap leveraging richer LLMs for the text decoder, speech self-supervised learning (SSL) features, and sentence rephrasing augmentation improves the accuracy and diversity of generated speaking-style captions.

Language Modelling Large Language Model +2

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

no code implementations24 Apr 2023 Kenichi Fujita, Takanori Ashihara, Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima

This paper proposes a zero-shot text-to-speech (TTS) conditioned by a self-supervised speech-representation model acquired through self-supervised learning (SSL).

Self-Supervised Learning Speech Synthesis +1

SIMD-size aware weight regularization for fast neural vocoding on CPU

no code implementations2 Nov 2022 Hiroki Kanagawa, Yusuke Ijima

Pruning time-consuming DNN modules is a promising way to realize a real-time vocoder on a CPU (e. g. WaveRNN, LPCNet).

Model architectures to extrapolate emotional expressions in DNN-based text-to-speech

no code implementations20 Feb 2021 Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

In this study, the meaning of "extrapolate emotional expressions" is to borrow emotional expressions from others, and the collection of emotional speech uttered by target speakers is unnecessary.

V2S attack: building DNN-based voice conversion from automatic speaker verification

no code implementations5 Aug 2019 Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari

The experimental evaluation compares converted voices between the proposed method that does not use the targeted speaker's voice data and the standard VC that uses the data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.