Voice Conversion

151 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations
3 papers
7,917
3 papers
2,104
See all 5 libraries.

FlashSpeech: Efficient Zero-Shot Speech Synthesis

zhenye234/CoMoSpeech 23 Apr 2024

The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation.

147
23 Apr 2024

High-Fidelity Neural Phonetic Posteriorgrams

interactiveaudiolab/ppgs 27 Feb 2024

A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e. g., phonemes).

53
27 Feb 2024

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

0nutation/speechgpt 24 Jan 2024

It comprises an autoregressive model based on LLM for semantic information modeling and a non-autoregressive model employing flow matching for perceptual information modeling.

902
24 Jan 2024

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation

hs-oh-prml/durflexevc 16 Jan 2024

Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics.

35
16 Jan 2024

AutoVisual Fusion Suite: A Comprehensive Evaluation of Image Segmentation and Voice Conversion Tools on HuggingFace Platform

amirrezahmi/video-inpainting-and-voice-cloning 17 Dec 2023

This study presents a comprehensive evaluation of tools available on the HuggingFace platform for two pivotal applications in artificial intelligence: image segmentation and voice conversion.

22
17 Dec 2023

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

cecile-hi/regularized-adaptive-weight-modification 15 Dec 2023

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

14
15 Dec 2023

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

1,082
21 Nov 2023

Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

wanghelin1997/aty-tts 16 Nov 2023

Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments.

8
16 Nov 2023

CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion

andersxa/cslp-ae NeurIPS 2023

While the present work only considers conversion of EEG, the proposed CSLP-AE provides a general framework for signal conversion and extraction of content (task activation) and style (subject variability) components of general interest for the modeling and analysis of biological signals.

8
13 Nov 2023

Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation

hayeong0/Diff-HierVC 8 Nov 2023

Finally, by using the masked prior in diffusion models, our model can improve the speaker adaptation quality.

144
08 Nov 2023