Search Results for author: Yuki Saito

Found 12 papers, 4 papers with code

SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts

1 code implementation30 Aug 2021 Masanari Kimura, Takuma Nakamura, Yuki Saito

In this paper, we propose SHIFT15M, a dataset that can be used to properly evaluate models in situations where the distribution of data changes between training and testing.

HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

no code implementations8 Feb 2021 Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

A DNN-based generator is trained using a human-based discriminator, i. e., humans' perceptual evaluations, instead of the GAN's DNN-based discriminator.

Exchangeable deep neural networks for set-to-set matching and learning

1 code implementation ECCV 2020 Yuki Saito, Takuma Nakamura, Hirotaka Hachiya, Kenji Fukumizu

Matching two different sets of items, called heterogeneous set-to-set matching problem, has recently received attention as a promising problem.

set matching

HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling

no code implementations25 Sep 2019 Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

To model the human-acceptable distribution, we formulate a backpropagation-based generator training algorithm by regarding human perception as a black-boxed discriminator.

V2S attack: building DNN-based voice conversion from automatic speaker verification

no code implementations5 Aug 2019 Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari

The experimental evaluation compares converted voices between the proposed method that does not use the targeted speaker's voice data and the standard VC that uses the data.

automatic-speech-recognition Speaker Verification +2

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

no code implementations19 Jul 2019 Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to multi-speaker modeling in speech synthesis, it does not correlate with the subjective inter-speaker similarity and is not necessarily appropriate speaker representation for open speakers whose speech utterances are not included in the training data.

Speech Quality Speech Synthesis

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

no code implementations9 Feb 2019 Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis.

Singing Voice Synthesis Speech Quality

Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

2 code implementations10 Jul 2018 Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms.

Sound Audio and Speech Processing

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

4 code implementations23 Sep 2017 Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.

Speech Quality Speech Synthesis +1

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

no code implementations10 Apr 2017 Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior probabilities estimated from the source speech parameters.

Speech Recognition Speech Synthesis +1

Cannot find the paper you are looking for? You can Submit a new open access paper.