Voice Conversion

66 papers with code • 1 benchmarks • 2 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet


Latest papers with code

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

yl4579/StarGANv2-VC 21 Jul 2021

We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2.

Voice Conversion

21 Jul 2021

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

xianghenghe/Improved_StarGAN_Emotional_Voice_Conversion 18 Jul 2021

Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information.

Data Augmentation Speech Emotion Recognition +1

18 Jul 2021

A Deep-Bayesian Framework for Adaptive Speech Duration Modification

ravi-0841/pytorch-speech-transformer 11 Jul 2021

During inference, we generate the attention map as a proxy for the similarity matrix between the given input speech and an unknown target speech signal.

Dynamic Time Warping Voice Conversion

11 Jul 2021

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Wendison/VQMIVC 18 Jun 2021

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement.

Quantization Voice Conversion

18 Jun 2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

alexa/amazon-voice-conversion-voicy 16 Jun 2021

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker.

Voice Conversion

16 Jun 2021

Low-Latency Real-Time Non-Parallel Voice Conversion based on Cyclic Variational Autoencoder and Multiband WaveRNN with Data-Driven Linear Prediction

patrickltobing/cyclevae-vc-neuralvoco 20 May 2021

To accommodate LLRT constraint with CPU, we propose a novel CycleVAE framework that utilizes mel-spectrogram as spectral features and is built with a sparse network architecture.

Voice Conversion

20 May 2021

Deep Learning Based Assessment of Synthetic Speech Naturalness

gabrielmittag/NISQA 23 Apr 2021

Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores.

Speech Quality Speech Synthesis +2

23 Apr 2021

Utilizing Self-supervised Representations for MOS Prediction

s3prl/s3prl 7 Apr 2021

In this paper, we use self-supervised pre-trained models for MOS prediction.

Speech Quality Voice Conversion

07 Apr 2021

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

s3prl/s3prl 7 Apr 2021

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

07 Apr 2021

Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques

mindslab-ai/assem-vc 2 Apr 2021

This paper also introduces the GTA finetuning in VC, which significantly improves the quality and the speaker similarity of the outputs.

Speech Synthesis Voice Conversion

02 Apr 2021