WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

26 Mar 2019 · Pritish Chandna, Merlijn Blaauw, Jordi Bonada, Emilia Gomez ·

We present a deep neural network based singing voice synthesizer, inspired by the Deep Convolutions Generative Adversarial Networks (DCGAN) architecture and optimized using the Wasserstein-GAN algorithm. We use vocoder parameters for acoustic modelling, to separate the influence of pitch and timbre. This facilitates the modelling of the large variability of pitch in the singing voice. Our network takes a block of consecutive frame-wise linguistic and fundamental frequency features, along with global singer identity as input and outputs vocoder features. For inference, sequential blocks are concatenated using an overlap-add procedure. We show that the performance of our model is comparable to the state-of-the-art and the original sample using objective metrics and a subjective listening test. We also present examples of the synthesis on a supplementary website and the source code via GitHub.

PDF Abstract

Code

Add Remove Mark official

MTG/WGANSing official

235

pc2752/Multi_Voice_Sing_Speak_Sing official

Datasets

Add Datasets introduced or used in this paper

Edit Social Preview

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

Code Edit Add Remove Mark official

Categories

Datasets Edit

Code

Add Remove Mark official

Datasets