Voice Cloning

27 papers with code • 0 benchmarks • 2 datasets

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Libraries

Use these libraries to find Voice Cloning models and implementations

Most implemented papers

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

PaddlePaddle/PaddleSpeech 9 Jul 2019

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.

Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert

serp-ai/bark-with-voice-clone Social Science Research Network (SSRN) 2023

Keywords: Bark, ai voice cloning, Suno, text-to-speech, artificial intelligence, audio generation, Meta's encodec, audio codebooks, semantic tokens, HuBert, transformer-based model, multilingual speech, wav2vec, linear projection head, embedding space, generative capabilities, pretrained model checkpoints

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

funaudiollm/cosyvoice 4 Jul 2024

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).

Neural Voice Cloning with a Few Samples

jackaduma/CycleGAN-VC2 NeurIPS 2018

Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

PaddlePaddle/PaddleSpeech 7 Nov 2022

In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

deep-privacy/SA-toolkit 5 Aug 2023

The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.

WavLM model ensemble for audio deepfake detection

pwc-1/Paper-9 14 Aug 2024

Audio deepfake detection has become a pivotal task over the last couple of years, as many recent speech synthesis and voice cloning systems generate highly realistic speech samples, thus enabling their use in malicious activities.

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

Tomiinek/Multilingual_Text_to_Speech 3 Aug 2020

We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches.

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

ming024/FastSpeech2 6 Mar 2021

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.