Text-To-Speech Synthesis
92 papers with code • 6 benchmarks • 17 datasets
Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.
Libraries
Use these libraries to find Text-To-Speech Synthesis models and implementationsDatasets
Latest papers
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis
This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications.
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis.
Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech
In this work, we propose to learn the AV representation from categorical emotion labels of speech.
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
The approach involved finetuning a multi-speaker TTS model to work with child speech.
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models.
ArTST: Arabic Text and Speech Transformer
We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling
We describe an end-to-end speech synthesis system that uses generative adversarial training.
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity.
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.