About

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Greatest papers with code

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

16 Oct 2020espnet/espnet

With these data, three neural TTS models -- Tacotron2, Transformer and FastSpeech are applied for building bilingual and code-switched TTS.

SPEECH SYNTHESIS VOICE CONVERSION

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

6 Oct 2020espnet/espnet

This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020.

SPEECH RECOGNITION VOICE CONVERSION

Mel-spectrogram augmentation for sequence to sequence voice conversion

6 Jan 2020makcedward/nlpaug

In addition, we proposed new policies (i. e., frequency warping, loudness and time length control) for more data variations.

VOICE CONVERSION

Unsupervised Speech Decomposition via Triple Information Bottleneck

ICML 2020 auspicious3000/autovc

Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm.

STYLE TRANSFER VOICE CONVERSION

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

14 May 2019auspicious3000/autovc

On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.

STYLE TRANSFER VOICE CONVERSION

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

23 Sep 2017r9y9/gantts

In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.

SPEECH SYNTHESIS VOICE CONVERSION

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

5 Jun 2020andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning

To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.

SELF-SUPERVISED LEARNING SPEAKER VERIFICATION VOICE CONVERSION

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

6 Mar 2021ming024/FastSpeech2

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

VOICE CONVERSION

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

NeurIPS 2019 liusongxiang/StarGAN-Voice-Conversion

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

AUDIO GENERATION VOICE CONVERSION

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

6 Jun 2018liusongxiang/StarGAN-Voice-Conversion

This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN.

4 VOICE CONVERSION