Search Results for author: Takuhiro Kaneko

Found 31 papers, 14 papers with code

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

no code implementations25 Mar 2024 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka

A generative adversarial network (GAN)-based vocoder trained with an adversarial discriminator is commonly used for speech synthesis because of its fast, lightweight, and high-quality characteristics.

Data Augmentation Generative Adversarial Network +1

Unsupervised Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training

no code implementations21 Mar 2024 Shogo Sato, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura

To address this challenge, we propose a novel approach that utilizes only an image during inference while utilizing an image and LiDAR intensity during training.

Intrinsic Image Decomposition

MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields

no code implementations ICCV 2023 Takuhiro Kaneko

We propose a multi-input multi-output NeRF (MIMO-NeRF) that reduces the number of MLPs running by replacing the SISO MLP with a MIMO MLP and conducting mappings in a group-wise manner.

Neural Rendering Novel View Synthesis +1

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

no code implementations14 Aug 2023 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

Owing to the difficulty of a 1D CNN to model high-dimensional spectrograms, the frequency dimension is reduced via temporal upsampling.

Speech Synthesis

AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields

no code implementations CVPR 2022 Takuhiro Kaneko

As an alternative to an AR-GAN, we propose an aperture rendering NeRF (AR-NeRF), which can utilize viewpoint and defocus cues in a unified manner by representing both factors in a common ray-tracing framework.

Representation Learning

iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform

1 code implementation4 Mar 2022 Takuhiro Kaneko, Kou Tanaka, Hirokazu Kameoka, Shogo Seki

In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is commonly applied as an intermediate representation, and the necessity for a mel-spectrogram vocoder is increasing.

Speech Synthesis Text-To-Speech Synthesis +1

MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

3 code implementations25 Feb 2021 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in missing frames based on surrounding frames.

Voice Conversion

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion

2 code implementations22 Oct 2020 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion.

Voice Conversion

Nonparallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks

1 code implementation27 Aug 2020 Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

We previously proposed a method that allows for nonparallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN.

Voice Conversion

Many-to-Many Voice Transformer Network

no code implementations18 May 2020 Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda

The main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers.

Voice Conversion

Blur, Noise, and Compression Robust Generative Adversarial Networks

no code implementations CVPR 2021 Takuhiro Kaneko, Tatsuya Harada

However, in contrast to NR-GAN, to address irreversible characteristics, we introduce masking architectures adjusting degradation strength values in a data-driven manner using bypasses before and after degradation.

Image Generation Image Restoration

Noise Robust Generative Adversarial Networks

2 code implementations CVPR 2020 Takuhiro Kaneko, Tatsuya Harada

Therefore, we propose distribution and transformation constraints that encourage the noise generator to capture only the noise-specific components.

Image Denoising Image Generation

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

3 code implementations29 Jul 2019 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

To bridge this gap, we rethink conditional methods of StarGAN-VC, which are key components for achieving non-parallel multi-domain VC in a single model, and propose an improved variant called StarGAN-VC2.

Voice Conversion

Label-Noise Robust Multi-Domain Image-to-Image Translation

1 code implementation6 May 2019 Takuhiro Kaneko, Tatsuya Harada

This problem is challenging in terms of scalability because it requires the learning of numerous mappings, the number of which increases proportional to the number of domains.

Image-to-Image Translation Translation

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

6 code implementations9 Apr 2019 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data.

Voice Conversion

Crossmodal Voice Conversion

no code implementations9 Apr 2019 Hirokazu Kameoka, Kou Tanaka, Aaron Valero Puche, Yasunori Ohishi, Takuhiro Kaneko

We use the latent code of an input face image encoded by the face encoder as the auxiliary input into the speech converter and train the speech converter so that the original latent code can be recovered from the generated speech by the voice encoder.

Decoder Voice Conversion

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

no code implementations5 Apr 2019 Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

WaveCycleGAN has recently been proposed to bridge the gap between natural and synthesized speech waveforms in statistical parametric speech synthesis and provides fast inference with a moving average model rather than an autoregressive model and high-quality speech synthesis with the adversarial training.

Speech Synthesis

Class-Distinct and Class-Mutual Image Generation with GANs

2 code implementations27 Nov 2018 Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

To overcome this limitation, we address a novel problem called class-distinct and class-mutual image generation, in which the goal is to construct a generator that can capture between-class relationships and generate an image selectively conditioned on the class specificity.

Conditional Image Generation Image-to-Image Translation +1

Label-Noise Robust Generative Adversarial Networks

3 code implementations CVPR 2019 Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

To remedy this, we propose a novel family of GANs called label-noise robust GANs (rGANs), which, by incorporating a noise transition model, can learn a clean label conditional generative distribution even when training labels are noisy.

Robust classification

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms

no code implementations9 Nov 2018 Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks.

Image Captioning Machine Translation +4

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

no code implementations5 Nov 2018 Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo

Second, it achieves many-to-many conversion by simultaneously learning mappings among multiple speakers using only a single model instead of separately learning mappings between each speaker pair using a different model.

Speech Enhancement Voice Conversion

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

no code implementations25 Sep 2018 Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka

The experimental results demonstrate that our proposed method can 1) alleviate the over-smoothing effect of the acoustic features despite the direct modification method used for the waveform and 2) greatly improve the naturalness of the generated speech sounds.

Speech Synthesis Voice Conversion

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

2 code implementations13 Aug 2018 Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier.

Attribute Decoder +1

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

13 code implementations6 Jun 2018 Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN.

Attribute Generative Adversarial Network +1

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

no code implementations CVPR 2018 Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino

This paper proposes the decision tree latent controller generative adversarial network (DTLC-GAN), an extension of a GAN that can learn hierarchically interpretable representations without relying on detailed supervision.

Generative Adversarial Network Image Generation +3

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms

no code implementations6 Apr 2018 Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Hiroyasu Ando

In this paper, we address the problem of reconstructing a time-domain signal (or a phase spectrogram) solely from a magnitude spectrogram.

Generative Adversarial Network

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

9 code implementations30 Nov 2017 Takuhiro Kaneko, Hirokazu Kameoka

A subjective evaluation showed that the quality of the converted speech was comparable to that obtained with a Gaussian mixture model-based method under advantageous conditions with parallel and twice the amount of data.

Voice Conversion

Generative Attribute Controller With Conditional Filtered Generative Adversarial Networks

no code implementations CVPR 2017 Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino

This controller is based on a novel generative model called the conditional filtered generative adversarial network (CFGAN), which is an extension of the conventional conditional GAN (CGAN) that incorporates a filtering architecture into the generator input.

Attribute Generative Adversarial Network +2

Cannot find the paper you are looking for? You can Submit a new open access paper.