Text-To-Speech Synthesis

92 papers with code • 6 benchmarks • 17 datasets

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-To-Speech Synthesis

Dataset	Best Model	Compare
LJSpeech	NaturalSpeech	See all
CMUDict 0.7b	Token-Level Ensemble Distillation	See all
20000 utterances	Mia	See all
HUI speech corpus	Tacotron 2	See all
Thorsten voice 21.02 neutral	Tacotron 2	See all
Trinity Speech-Gesture Dataset	Match-TTSG	See all

Libraries

Use these libraries to find Text-To-Speech Synthesis models and implementations

PaddlePaddle/PaddleSpeech

12 papers

10,142

coqui-ai/TTS

10 papers

29,239

keonlee9420/Expressive-FastSpeech2

5 papers

259

TensorSpeech/TensorflowTTS

4 papers

3,701

See all 12 libraries.

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Guided Flows for Generative Modeling and Decision Making

no code yet • 22 Nov 2023

Classifier-free guidance is a key component for enhancing the performance of conditional generative models across diverse tasks.

Paper
Add Code

Generative Pre-training for Speech with Flow Matching

no code yet • 25 Oct 2023

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.

Paper
Add Code

Unified speech and gesture synthesis using flow matching

no code yet • 8 Oct 2023

As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.

Paper
Add Code

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

no code yet • 4 Oct 2023

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech.

Paper
Add Code

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

no code yet • 22 Sep 2023

This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressive and high-fidelity text-to-speech (TTS) synthesis.

Paper
Add Code

The FruitShell French synthesis system at the Blizzard 2023 Challenge

no code yet • 1 Sep 2023

The evaluation results of our system showed a quality MOS score of 3. 6 for the Hub task and 3. 4 for the Spoke task, placing our system at an average level among all participating teams.

Paper
Add Code

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

no code yet • 31 Aug 2023

The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style.

Paper
Add Code

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

no code yet • 2 Aug 2023

In the SALTTS-parallel implementation, the representations from this second encoder are used for an auxiliary reconstruction loss with the SSL features.

Paper
Add Code

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

no code yet • 31 Jul 2023

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.

Paper
Add Code

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

no code yet • 18 Jul 2023

In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement.

Paper
Add Code

Text-To-Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result