Voice Conversion

65 papers with code • 1 benchmarks • 2 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet


Greatest papers with code

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

espnet/espnet 16 Oct 2020

With these data, three neural TTS models -- Tacotron2, Transformer and FastSpeech are applied for building bilingual and code-switched TTS.

Speech Synthesis Voice Conversion

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

espnet/espnet 6 Oct 2020

This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020.

Speech Recognition Voice Conversion

Mel-spectrogram augmentation for sequence to sequence voice conversion

makcedward/nlpaug 6 Jan 2020

In addition, we proposed new policies (i. e., frequency warping, loudness and time length control) for more data variations.

Voice Conversion

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning 7 Apr 2021

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning 5 Jun 2020

To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.

Self-Supervised Learning Speaker Verification +1

Unsupervised Speech Decomposition via Triple Information Bottleneck

auspicious3000/autovc ICML 2020

Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm.

Style Transfer Voice Conversion

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

auspicious3000/autovc 14 May 2019

On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.

Style Transfer Voice Conversion

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

ming024/FastSpeech2 6 Mar 2021

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

Voice Conversion

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

r9y9/gantts 23 Sep 2017

In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.

Speech Quality Speech Synthesis +1