Text-To-Speech Synthesis

92 papers with code • 6 benchmarks • 17 datasets

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-To-Speech Synthesis

Dataset	Best Model	Compare
LJSpeech	NaturalSpeech	See all
CMUDict 0.7b	Token-Level Ensemble Distillation	See all
20000 utterances	Mia	See all
HUI speech corpus	Tacotron 2	See all
Thorsten voice 21.02 neutral	Tacotron 2	See all
Trinity Speech-Gesture Dataset	Match-TTSG	See all

Libraries

Use these libraries to find Text-To-Speech Synthesis models and implementations

PaddlePaddle/PaddleSpeech

12 papers

10,095

coqui-ai/TTS

10 papers

28,968

keonlee9420/Expressive-FastSpeech2

5 papers

258

TensorSpeech/TensorflowTTS

4 papers

3,693

See all 12 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Matcha-TTS: A fast TTS architecture with conditional flow matching

shivammehta25/Matcha-TTS • • 6 Sep 2023

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM).

381

06 Sep 2023

Paper
Code

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

choijeongsoo/utut • • 3 Aug 2023

A single pre-trained model with UTUT can be employed for diverse multilingual speech- and text-related tasks, such as Speech-to-Speech Translation (STS), multilingual Text-to-Speech Synthesis (TTS), and Text-to-Speech Translation (TTST).

03 Aug 2023

Paper
Code

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

lucidrains/voicebox-pytorch • • NeurIPS 2023

Similar to GPT, Voicebox can perform many different tasks through in-context learning, but is more flexible as it can also condition on future context.

503

23 Jun 2023

Paper
Code

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

is2ai/turkictts • • 25 May 2023

This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.

25 May 2023

Paper
Code

Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert

serp-ai/bark-with-voice-clone • • Social Science Research Network (SSRN) 2023

Keywords: Bark, ai voice cloning, Suno, text-to-speech, artificial intelligence, audio generation, Meta's encodec, audio codebooks, semantic tokens, HuBert, transformer-based model, multilingual speech, wav2vec, linear projection head, embedding space, generative capabilities, pretrained model checkpoints

2,798

18 Apr 2023

Paper
Code

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

plachtaa/vall-e-x • • 7 Mar 2023

We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis.

7,138

07 Mar 2023

Paper
Code

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

naver-ai/facetts • • 27 Feb 2023

This is the first time that face images are used as a condition to train a TTS model.

27 Feb 2023

Paper
Code

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

b04901014/mqtts • • 8 Feb 2023

Recent Text-to-Speech (TTS) systems trained on reading or acted corpora have achieved near human-level naturalness.

224

08 Feb 2023

Paper
Code

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

suno-ai/bark • • 5 Jan 2023

In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.

32,377

05 Jan 2023

Paper
Code

RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis

shinhyeokoh/rwen • • 15 Dec 2022

With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged.

15 Dec 2022

Paper
Code

Text-To-Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result