Speech Synthesis

290 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Synthesis

Dataset	Best Model	Compare
LibriTTS	EVA-GAN-big	See all
North American English		See all
LJSpeech	BDDM vocoder	See all
Mandarin Chinese	WaveNet (L+F)	See all

Libraries

Use these libraries to find Speech Synthesis models and implementations

coqui-ai/TTS

15 papers

29,008

PaddlePaddle/PaddleSpeech

15 papers

10,095

TensorSpeech/TensorflowTTS

6 papers

3,695

keonlee9420/Expressive-FastSpeech2

4 papers

258

See all 22 libraries.

Datasets

Subtasks

Speech Synthesis - Tamil

Speech Synthesis - Kannada

Speech Synthesis - Malayalam

Speech Synthesis - Telugu

Speech Synthesis - Assamese

Speech Synthesis - Bengali

Speech Synthesis - Bodo

Speech Synthesis - Gujarati

Speech Synthesis - Hindi

Speech Synthesis - Manipuri

Speech Synthesis - Marathi

Speech Synthesis - Rajasthani

Latest papers with no code

Most implemented Social Latest No code

Towards Accurate Lip-to-Speech Synthesis in-the-Wild

no code yet • 2 Mar 2024

In this paper, we introduce a novel approach to address the task of synthesizing speech from silent videos of any in-the-wild speaker solely based on lip movements.

Paper
Add Code

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

no code yet • 1 Mar 2024

This forces the model to learn a speaker distribution disentangled from the semantic content.

Paper
Add Code

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

no code yet • 29 Feb 2024

Without any transcribed speech in a new language, this TTS model can generate intelligible speech in >30 unseen languages (CER difference of <10% to ground truth).

Paper
Add Code

Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting

no code yet • 19 Feb 2024

Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker factored approximations produces a better preservation of the pre-training knowledge than the diagonal ones.

Paper
Add Code

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

no code yet • 16 Feb 2024

Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks.

Paper
Add Code

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

no code yet • 11 Feb 2024

This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.

Paper
Add Code

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

no code yet • 31 Jan 2024

Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model.

Paper
Add Code

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

no code yet • 31 Jan 2024

The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns.

Paper
Add Code

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

no code yet • 30 Jan 2024

Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation.

Paper
Add Code

MunTTS: A Text-to-Speech System for Mundari

no code yet • 28 Jan 2024

We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family.

Paper
Add Code

Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result