Search Results for author: Patrick Lumban Tobing

Found 14 papers, 9 papers with code

Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer

no code implementations • 20 Jun 2023 • Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Giuseppe Coccia, Patrick Lumban Tobing, Ravichander Vipperla, Viacheslav Klimkov, Vincent Pollet

Speech generation for machine dubbing adds complexity to conventional Text-To-Speech solutions as the generated output is required to match the expressiveness, emotion and speaking rate of the source content.

Paper
Add Code

Cross-lingual Prosody Transfer for Expressive Machine Dubbing

no code implementations • 20 Jun 2023 • Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Patrick Lumban Tobing, Ravichander Vipperla, Vincent Pollet

Prosody transfer is well-studied in the context of expressive speech synthesis.

Expressive Speech Synthesis

Paper
Add Code

A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System

no code implementations • 13 Jul 2022 • Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda

Neural-based text-to-speech (TTS) systems achieve very high-fidelity speech generation because of the rapid neural network developments.

Paper
Add Code

Low-Latency Real-Time Non-Parallel Voice Conversion based on Cyclic Variational Autoencoder and Multiband WaveRNN with Data-Driven Linear Prediction

2 code implementations • 20 May 2021 • Patrick Lumban Tobing, Tomoki Toda

To accommodate LLRT constraint with CPU, we propose a novel CycleVAE framework that utilizes mel-spectrogram as spectral features and is built with a sparse network architecture.

Voice Conversion

Paper
Code

High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling

1 code implementation • 20 May 2021 • Patrick Lumban Tobing, Tomoki Toda

This paper presents a novel high-fidelity and low-latency universal neural vocoder framework based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling (MWDLP).

Low-latency processing

Paper
Code

crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

1 code implementation • 4 Mar 2021 • Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda

In this paper, we present an open-source software for developing a nonparallel voice conversion (VC) system named crank.

Voice Conversion

166

Paper
Code

The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders

no code implementations • 9 Oct 2020 • Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda

In this paper, we present the voice conversion (VC) systems developed at Nagoya University (NU) for the Voice Conversion Challenge 2020 (VCC2020).

Task 2 Voice Conversion

Paper
Add Code

Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN

1 code implementation • 9 Oct 2020 • Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda

In this paper, we present a description of the baseline system of Voice Conversion Challenge (VCC) 2020 with a cyclic variational autoencoder (CycleVAE) and Parallel WaveGAN (PWG), i. e., CycleVAEPWG.

Generative Adversarial Network Task 2 +1

131

Paper
Code

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

1 code implementation • 11 Jul 2020 • Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the limited pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs).

Paper
Code

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

2 code implementations • 24 Jul 2019 • Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

In this work, to overcome this problem, we propose to use CycleVAE-based spectral model that indirectly optimizes the conversion flow by recycling the converted features back into the system to obtain corresponding cyclic reconstructed spectra that can be directly optimized.

Voice Conversion

Paper
Code

Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder

1 code implementation • 21 Jul 2019 • Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

However, because of the fixed dilated convolution and generic network architecture, the WN vocoder lacks robustness against unseen input features and often requires a huge network size to achieve acceptable speech quality.

Audio and Speech Processing Sound

Paper
Code

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation

1 code implementation • 1 Jul 2019 • Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

In this paper, we propose a quasi-periodic neural network (QPNet) vocoder with a novel network architecture named pitch-dependent dilated convolution (PDCNN) to improve the pitch controllability of WaveNet (WN) vocoder.

Paper
Code

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

1 code implementation • 2 May 2019 • Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Such hypothesis implies that during the conversion phase, the latent codes and the converted features in VAE based VC are in fact source F0 dependent.

Disentanglement Voice Conversion

Paper
Code

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

no code implementations • 27 Nov 2018 • Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation.

Voice Conversion

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.