Voice Cloning

17 papers with code • 0 benchmarks • 2 datasets

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Benchmarks

Add a Result

These leaderboards are used to track progress in Voice Cloning

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Voice Cloning models and implementations

PaddlePaddle/PaddleSpeech

3 papers

10,084

PaddlePaddle/DeepSpeech

2 papers

10,088

Datasets

Most implemented papers

Most implemented Social Latest No code

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

CorentinJ/Real-Time-Voice-Cloning • • NeurIPS 2018

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Paper
Code

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

PaddlePaddle/DeepSpeech • • 9 Jul 2019

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.

Paper
Code

Neural Voice Cloning with a Few Samples

jackaduma/CycleGAN-VC2 • • NeurIPS 2018

Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.

Paper
Code

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

PaddlePaddle/PaddleSpeech • • 7 Nov 2022

In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.

Paper
Code

Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert

serp-ai/bark-with-voice-clone • • Social Science Research Network (SSRN) 2023

Keywords: Bark, ai voice cloning, Suno, text-to-speech, artificial intelligence, audio generation, Meta's encodec, audio codebooks, semantic tokens, HuBert, transformer-based model, multilingual speech, wav2vec, linear projection head, embedding space, generative capabilities, pretrained model checkpoints

Paper
Code

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

deep-privacy/SA-toolkit • • 5 Aug 2023

The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.

Paper
Code

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

Tomiinek/Multilingual_Text_to_Speech • • 3 Aug 2020

We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches.

Paper
Code

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

ming024/FastSpeech2 • • 6 Mar 2021

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

Paper
Code

Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss

inconnu11/Objective-evaluation_speech_synthesis • 22 Apr 2021

We achieve cross-lingual VC between Mandarin speech with multiple speakers and English speech with multiple speakers by applying bilingual bottleneck features.

Paper
Code

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

tpulkit/txt2vid • • 26 Jun 2021

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure.

Paper
Code

Voice Cloning

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result