Voice Cloning

17 papers with code • 0 benchmarks • 2 datasets

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.


Use these libraries to find Voice Cloning models and implementations

Most implemented papers

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

PaddlePaddle/PaddleSpeech 9 Jul 2019

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.

Neural Voice Cloning with a Few Samples

jackaduma/CycleGAN-VC2 NeurIPS 2018

Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

PaddlePaddle/PaddleSpeech 7 Nov 2022

In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.

Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert

serp-ai/bark-with-voice-clone Social Science Research Network (SSRN) 2023

Keywords: Bark, ai voice cloning, Suno, text-to-speech, artificial intelligence, audio generation, Meta's encodec, audio codebooks, semantic tokens, HuBert, transformer-based model, multilingual speech, wav2vec, linear projection head, embedding space, generative capabilities, pretrained model checkpoints

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

Tomiinek/Multilingual_Text_to_Speech 3 Aug 2020

We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches.

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

ming024/FastSpeech2 6 Mar 2021

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss

inconnu11/Objective-evaluation_speech_synthesis 22 Apr 2021

We achieve cross-lingual VC between Mandarin speech with multiple speakers and English speech with multiple speakers by applying bilingual bottleneck features.

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

tpulkit/txt2vid 26 Jun 2021

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure.

Discovery of Single Independent Latent Variable

shaham-lab/disilv 12 Oct 2021

Latent variable discovery is a central problem in data analysis with a broad range of applications in applied science.