Voice Conversion
151 papers with code • 2 benchmarks • 5 datasets
Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.
Libraries
Use these libraries to find Voice Conversion models and implementationsLatest papers
Low-latency Real-time Voice Conversion on CPU
To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model.
Non-Parallel Training Approach for Emotional Voice Conversion Using CycleGAN
The focus of this research is proposing a nonparallel emotional voice conversion for Egyptian Arabic speech.
BiSinger: Bilingual Singing Voice Synthesis
We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data.
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems.
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preserving at least linguistic content.
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
In this paper, we show that StarGANv2-VC fails to disentangle the speaker and emotion representations, pertinent to preserve emotion.
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion
In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity.
FSD: An Initial Chinese Dataset for Fake Song Detection
In this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset to investigate the field of song deepfake detection.
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.