Voice Conversion

151 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations
3 papers
7,917
3 papers
2,106
See all 5 libraries.

Low-latency Real-time Voice Conversion on CPU

koeai/llvc 1 Nov 2023

To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model.

348
01 Nov 2023

BiSinger: Bilingual Singing Voice Synthesis

BiSinger-SVS/BiSinger 25 Sep 2023

We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data.

5
25 Sep 2023

HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

talkingnow/HM-Conformer 15 Sep 2023

Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems.

11
15 Sep 2023

Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

suhitaghosh10/emo-stargan 14 Sep 2023

Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preserving at least linguistic content.

40
14 Sep 2023

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

arnabdas8901/StarGAN-VC_PlusPlus 14 Sep 2023

In this paper, we show that StarGANv2-VC fails to disentangle the speaker and emotion representations, pertinent to preserve emotion.

7
14 Sep 2023

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

unilight/seq2seq-vc 5 Sep 2023

In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity.

63
05 Sep 2023

FSD: An Initial Chinese Dataset for Fake Song Detection

xieyuankun/fsd-dataset 5 Sep 2023

In this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset to investigate the field of song deepfake detection.

17
05 Sep 2023

Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

PhonemeHallucinator/Phoneme_Hallucinator 11 Aug 2023

Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.

36
11 Aug 2023

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

voice-privacy-challenge/voice-privacy-challenge-2024 5 Aug 2023

The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.

22
05 Aug 2023