no code implementations • LREC 2022 • Nobukatsu Hojo, Satoshi Kobashikawa, Saki Mizuno, Ryo Masumura
To investigate SPNOP, a corpus with various psychological measurements is beneficial because the interaction process of negotiation relates to many aspects of psychology.
1 code implementation • 15 Jan 2025 • Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Saki Mizuno, Keita Suzuki, Ryo Masumura, Hiroaki Sugiyama, Kuniko Saito
These verbalized thoughts serve as answers to questions designed to assess the mental states of characters within conversations.
no code implementations • 4 Jun 2023 • Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando
Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 24 May 2023 • Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo
In this work, we propose a new SE training criterion that minimizes the distance between clean and enhanced signals in the feature representation of the SSL model to alleviate the mismatch.
3 code implementations • 25 Feb 2021 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in missing frames based on surrounding frames.
no code implementations • 20 Feb 2021 • Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima
In this study, the meaning of "extrapolate emotional expressions" is to borrow emotional expressions from others, and the collection of emotional speech uttered by target speakers is unnecessary.
2 code implementations • 22 Oct 2020 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion.
1 code implementation • 27 Aug 2020 • Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
We previously proposed a method that allows for nonparallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN.
no code implementations • 18 May 2020 • Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda
The main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers.
3 code implementations • 29 Jul 2019 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
To bridge this gap, we rethink conditional methods of StarGAN-VC, which are key components for achieving non-parallel multi-domain VC in a single model, and propose an improved variant called StarGAN-VC2.
6 code implementations • 9 Apr 2019 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data.
no code implementations • 5 Apr 2019 • Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo
WaveCycleGAN has recently been proposed to bridge the gap between natural and synthesized speech waveforms in statistical parametric speech synthesis and provides fast inference with a moving average model rather than an autoregressive model and high-quality speech synthesis with the adversarial training.
no code implementations • 9 Nov 2018 • Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo
This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks.
no code implementations • 5 Nov 2018 • Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo
Second, it achieves many-to-many conversion by simultaneously learning mappings among multiple speakers using only a single model instead of separately learning mappings between each speaker pair using a different model.
no code implementations • 25 Sep 2018 • Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka
The experimental results demonstrate that our proposed method can 1) alleviate the over-smoothing effect of the acoustic features despite the direct modification method used for the waveform and 2) greatly improve the naturalness of the generated speech sounds.
2 code implementations • 13 Aug 2018 • Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier.
14 code implementations • 6 Jun 2018 • Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN.
no code implementations • 6 Apr 2018 • Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Hiroyasu Ando
In this paper, we address the problem of reconstructing a time-domain signal (or a phase spectrogram) solely from a magnitude spectrogram.