Voice Conversion

114 papers with code • 1 benchmarks • 2 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations

Datasets


TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

winddori2002/TriAAN-VC 16 Mar 2023

The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics.

1
16 Mar 2023

StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

yl4579/StyleTTS-VC 29 Dec 2022

Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models.

67
29 Dec 2022

Speaking Style Conversion With Discrete Self-Supervised Units

gallilmaimon/DISSC 19 Dec 2022

We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate the proposed approach is significantly superior to the evaluated baselines.

80
19 Dec 2022

SpeechLMScore: Evaluating speech generation using speech language model

espnet/espnet 8 Dec 2022

While human evaluation is the most reliable metric for evaluating speech generation systems, it is generally costly and time-consuming.

6,238
08 Dec 2022

Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

nii-yamagishilab/speaker_sex_attribute_privacy 29 Nov 2022

The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes.

13
29 Nov 2022

A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units

b04901014/uuvc 12 Nov 2022

To address these issues, we devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.

52
12 Nov 2022

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

olawod/freevc 27 Oct 2022

Voice conversion (VC) can be achieved by first extracting source content information and target speaker information, and then reconstructing waveform with these information.

195
27 Oct 2022

GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models

rf5/simple-asgan 11 Oct 2022

As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer.

41
11 Oct 2022

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed

MelissaChen15/control-vc 23 Sep 2022

In this paper, we propose ControlVC, the first neural voice conversion system that achieves time-varying controls on pitch and speed.

81
23 Sep 2022