Voice Conversion

151 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Benchmarks

Add a Result

These leaderboards are used to track progress in Voice Conversion

Trend	Dataset	Best Model	Paper	Code	Compare
	ZeroSpeech 2019 English	VQ-CPC			See all
	LibriSpeech test-clean	kNN-VC (prematched HiFiGAN)			See all

Libraries

Use these libraries to find Voice Conversion models and implementations

espnet/espnet

3 papers

7,917

andi611/Self-Supervised-Speech-Pret…

3 papers

2,105

s3prl/s3prl

3 papers

2,104

unilight/seq2seq-vc

3 papers

See all 5 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

FlashSpeech: Efficient Zero-Shot Speech Synthesis

zhenye234/CoMoSpeech • • 23 Apr 2024

The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation.

147

23 Apr 2024

Paper
Code

High-Fidelity Neural Phonetic Posteriorgrams

interactiveaudiolab/ppgs • • 27 Feb 2024

A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e. g., phonemes).

27 Feb 2024

Paper
Code

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

0nutation/speechgpt • • 24 Jan 2024

It comprises an autoregressive model based on LLM for semantic information modeling and a non-autoregressive model employing flow matching for perceptual information modeling.

902

24 Jan 2024

Paper
Code

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation

hs-oh-prml/durflexevc • • 16 Jan 2024

Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics.

16 Jan 2024

Paper
Code

AutoVisual Fusion Suite: A Comprehensive Evaluation of Image Segmentation and Voice Conversion Tools on HuggingFace Platform

amirrezahmi/video-inpainting-and-voice-cloning • 17 Dec 2023

This study presents a comprehensive evaluation of tools available on the HuggingFace platform for two pivotal applications in artificial intelligence: image segmentation and voice conversion.

17 Dec 2023

Paper
Code

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

cecile-hi/regularized-adaptive-weight-modification • • 15 Dec 2023

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

15 Dec 2023

Paper
Code

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp • • 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

1,082

21 Nov 2023

Paper
Code

Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

wanghelin1997/aty-tts • • 16 Nov 2023

Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments.

16 Nov 2023

Paper
Code

CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion

andersxa/cslp-ae • • NeurIPS 2023

While the present work only considers conversion of EEG, the proposed CSLP-AE provides a general framework for signal conversion and extraction of content (task activation) and style (subject variability) components of general interest for the modeling and analysis of biological signals.

13 Nov 2023

Paper
Code

Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation

hayeong0/Diff-HierVC • • 8 Nov 2023

Finally, by using the masked prior in diffusion models, our model can improve the speaker adaptation quality.

144

08 Nov 2023

Paper
Code

Voice Conversion

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result