Search Results for author: Yuki Saito

Found 24 papers, 4 papers with code

On permutation-invariant neural networks

no code implementations • 26 Mar 2024 • Masanari Kimura, Ryotaro Shimizu, Yuki Hirakawa, Ryosuke Goto, Yuki Saito

From these observations, we show that Deep Sets, one of the well-known permutation-invariant neural networks, can be generalized in the sense of a quasi-arithmetic mean.

Paper
Add Code

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

no code implementations • 1 Feb 2024 • Dong Yang, Tomoki Koriyama, Yuki Saito

Developing Text-to-Speech (TTS) systems that can synthesize natural breath is essential for human-like voice agents but requires extensive manual annotation of breath positions in training data.

Paper
Add Code

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

no code implementations • 28 Nov 2023 • Kazuki Yamauchi, Yusuke Ijima, Yuki Saito

The experimental results demonstrate that our StyleCap leveraging richer LLMs for the text decoder, speech self-supervised learning (SSL) features, and sentence rephrasing augmentation improves the accuracy and diversity of generated speaking-style captions.

Language Modelling Large Language Model +2

Paper
Add Code

Outfit Completion via Conditional Set Transformation

no code implementations • 28 Nov 2023 • Takuma Nakamura, Yuki Saito, Ryosuke Goto

In this paper, we formulate the outfit completion problem as a set retrieval task and propose a novel framework for solving this problem.

Retrieval

Paper
Add Code

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

no code implementations • 19 Jun 2023 • Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Yoshiaki Ota, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healthcare, lifestyles, and personalities.

Paper
Add Code

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center

no code implementations • 23 May 2023 • Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

We present CALLS, a Japanese speech corpus that considers phone calls in a customer center as a new domain of empathetic spoken dialogue.

Speech Synthesis

Paper
Add Code

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings

no code implementations • 23 May 2023 • Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari

We focus on ChatGPT's reading comprehension and introduce it to EDSS, a task of synthesizing speech that can empathize with the interlocutor's emotion.

Chatbot Reading Comprehension +2

Paper
Add Code

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

no code implementations • 27 Feb 2023 • Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

We also leverage duration-aware pause insertion for more natural multi-speaker TTS.

Language Modelling

Paper
Add Code

Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

no code implementations • 26 Sep 2022 • Yusuke Nakai, Yuki Saito, Kenta Udagawa, Hiroshi Saruwatari

A conventional generative adversarial network (GAN)-based training algorithm significantly improves the quality of synthetic speech by reducing the statistical difference between natural and synthetic speech.

Generative Adversarial Network

Paper
Add Code

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS

no code implementations • 21 Jun 2022 • Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari

With a conventional speaker-adaptation method, a target speaker's embedding vector is extracted from his/her reference speech using a speaker encoder trained on a speaker-discriminative task.

Paper
Add Code

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History

no code implementations • 16 Jun 2022 • Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

To train the empathetic DSS model effectively, we investigate 1) a self-supervised learning model pretrained with large speech corpora, 2) a style-guided training using a prosody embedding of the current utterance to be predicted by the dialogue context embedding, 3) a cross-modal attention to combine text and speech modalities, and 4) a sentence-wise embedding to achieve fine-grained prosody modeling rather than utterance-wise modeling.

Self-Supervised Learning Sentence +2

Paper
Add Code

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent

no code implementations • 28 Mar 2022 • Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

We describe our methodology to construct an empathetic dialogue speech corpus and report the analysis results of the STUDIES corpus.

Paper
Add Code

SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts

2 code implementations • 30 Aug 2021 • Masanari Kimura, Takuma Nakamura, Yuki Saito

This paper addresses the problem of set-to-set matching, which involves matching two different sets of items based on some criteria, especially in the case of high-dimensional items like images.

BIG-bench Machine Learning set matching

161

Paper
Code

HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

no code implementations • 8 Feb 2021 • Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

A DNN-based generator is trained using a human-based discriminator, i. e., humans' perceptual evaluations, instead of the GAN's DNN-based discriminator.

Generative Adversarial Network

Paper
Add Code

SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay

no code implementations • LREC 2020 • Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

Developing a spontaneous speech corpus would be beneficial for spoken language processing and understanding.

Paper
Add Code

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus

no code implementations • LREC 2020 • Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari

In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis.

Speech Synthesis

Paper
Add Code

Exchangeable deep neural networks for set-to-set matching and learning

2 code implementations • ECCV 2020 • Yuki Saito, Takuma Nakamura, Hirotaka Hachiya, Kenji Fukumizu

Matching two different sets of items, called heterogeneous set-to-set matching problem, has recently received attention as a promising problem.

set matching

161

Paper
Code

HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling

no code implementations • 25 Sep 2019 • Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

To model the human-acceptable distribution, we formulate a backpropagation-based generator training algorithm by regarding human perception as a black-boxed discriminator.

Generative Adversarial Network

Paper
Add Code

V2S attack: building DNN-based voice conversion from automatic speaker verification

no code implementations • 5 Aug 2019 • Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari

The experimental evaluation compares converted voices between the proposed method that does not use the targeted speaker's voice data and the standard VC that uses the data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

no code implementations • 19 Jul 2019 • Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to multi-speaker modeling in speech synthesis, it does not correlate with the subjective inter-speaker similarity and is not necessarily appropriate speaker representation for open speakers whose speech utterances are not included in the training data.

Speech Synthesis

Paper
Add Code

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

no code implementations • 9 Feb 2019 • Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis.

Singing Voice Synthesis

Paper
Add Code

Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

2 code implementations • 10 Jul 2018 • Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms.

Sound Audio and Speech Processing

Paper
Code

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

4 code implementations • 23 Sep 2017 • Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.

Speech Synthesis Voice Conversion

514

Paper
Code

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

no code implementations • 10 Apr 2017 • Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior probabilities estimated from the source speech parameters.

speech-recognition Speech Recognition +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.