Voice Conversion

66 papers with code • 1 benchmarks • 2 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Datasets


Latest papers with code

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

yl4579/StarGANv2-VC 21 Jul 2021

We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2.

Voice Conversion

41
21 Jul 2021

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

xianghenghe/Improved_StarGAN_Emotional_Voice_Conversion 18 Jul 2021

Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information.

Data Augmentation Speech Emotion Recognition +1

4
18 Jul 2021

A Deep-Bayesian Framework for Adaptive Speech Duration Modification

ravi-0841/pytorch-speech-transformer 11 Jul 2021

During inference, we generate the attention map as a proxy for the similarity matrix between the given input speech and an unknown target speech signal.

Dynamic Time Warping Voice Conversion

0
11 Jul 2021

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Wendison/VQMIVC 18 Jun 2021

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement.

Quantization Voice Conversion

99
18 Jun 2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

alexa/amazon-voice-conversion-voicy 16 Jun 2021

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker.

Voice Conversion

2
16 Jun 2021

Low-Latency Real-Time Non-Parallel Voice Conversion based on Cyclic Variational Autoencoder and Multiband WaveRNN with Data-Driven Linear Prediction

patrickltobing/cyclevae-vc-neuralvoco 20 May 2021

To accommodate LLRT constraint with CPU, we propose a novel CycleVAE framework that utilizes mel-spectrogram as spectral features and is built with a sparse network architecture.

Voice Conversion

52
20 May 2021

Deep Learning Based Assessment of Synthetic Speech Naturalness

gabrielmittag/NISQA 23 Apr 2021

Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores.

Speech Quality Speech Synthesis +2

145
23 Apr 2021

Utilizing Self-supervised Representations for MOS Prediction

s3prl/s3prl 7 Apr 2021

In this paper, we use self-supervised pre-trained models for MOS prediction.

Speech Quality Voice Conversion

792
07 Apr 2021

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

s3prl/s3prl 7 Apr 2021

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

792
07 Apr 2021

Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques

mindslab-ai/assem-vc 2 Apr 2021

This paper also introduces the GTA finetuning in VC, which significantly improves the quality and the speaker similarity of the outputs.

Speech Synthesis Voice Conversion

110
02 Apr 2021