Text-To-Speech Synthesis

92 papers with code • 6 benchmarks • 17 datasets

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Libraries

Use these libraries to find Text-To-Speech Synthesis models and implementations

Latest papers with no code

Guided Flows for Generative Modeling and Decision Making

no code yet • 22 Nov 2023

Classifier-free guidance is a key component for enhancing the performance of conditional generative models across diverse tasks.

Generative Pre-training for Speech with Flow Matching

no code yet • 25 Oct 2023

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.

Unified speech and gesture synthesis using flow matching

no code yet • 8 Oct 2023

As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures.

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

no code yet • 4 Oct 2023

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech.

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

no code yet • 22 Sep 2023

This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressive and high-fidelity text-to-speech (TTS) synthesis.

The FruitShell French synthesis system at the Blizzard 2023 Challenge

no code yet • 1 Sep 2023

The evaluation results of our system showed a quality MOS score of 3. 6 for the Hub task and 3. 4 for the Spoke task, placing our system at an average level among all participating teams.

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

no code yet • 31 Aug 2023

The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style.

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

no code yet • 2 Aug 2023

In the SALTTS-parallel implementation, the representations from this second encoder are used for an auxiliary reconstruction loss with the SSL features.

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

no code yet • 31 Jul 2023

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

no code yet • 18 Jul 2023

In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement.