Voice Conversion

147 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations
3 papers
7,774
3 papers
2,066
See all 5 libraries.

BiSinger: Bilingual Singing Voice Synthesis

BiSinger-SVS/BiSinger 25 Sep 2023

We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data.

4
25 Sep 2023

Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

suhitaghosh10/emo-stargan 14 Sep 2023

Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preserving at least linguistic content.

39
14 Sep 2023

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

arnabdas8901/StarGAN-VC_PlusPlus 14 Sep 2023

In this paper, we show that StarGANv2-VC fails to disentangle the speaker and emotion representations, pertinent to preserve emotion.

7
14 Sep 2023

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

unilight/seq2seq-vc 5 Sep 2023

In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity.

56
05 Sep 2023

FSD: An Initial Chinese Dataset for Fake Song Detection

xieyuankun/fsd-dataset 5 Sep 2023

In this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset to investigate the field of song deepfake detection.

16
05 Sep 2023

Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

PhonemeHallucinator/Phoneme_Hallucinator 11 Aug 2023

Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.

35
11 Aug 2023

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

deep-privacy/SA-toolkit 5 Aug 2023

The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.

14
05 Aug 2023

Rhythm Modeling for Voice Conversion

bshall/urhythmic 12 Jul 2023

Voice conversion aims to transform source speech into a different target voice.

70
12 Jul 2023

Disentanglement in a GAN for Unconditional Speech Synthesis

rf5/simple-asgan 4 Jul 2023

We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training.

56
04 Jul 2023