Speech Synthesis

286 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Libraries

Use these libraries to find Speech Synthesis models and implementations

Towards Decoding Brain Activity During Passive Listening of Speech

milaniusz/speech2brain2speech 26 Feb 2024

The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech.

3
26 Feb 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

walker-hyf/ecss 19 Dec 2023

Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.

33
19 Dec 2023

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

cecile-hi/regularized-adaptive-weight-modification 15 Dec 2023

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

13
15 Dec 2023

Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism

g-milis/NEUTART 11 Dec 2023

Our method, which we call NEUral Text to ARticulate Talk (NEUTART), is a talking face generator that uses a joint audiovisual feature space, as well as speech-informed 3D facial reconstructions and a lip-reading loss for visual supervision.

19
11 Dec 2023

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

ETZET/SpeechEmotionAVLearning 24 Nov 2023

In this work, we propose to learn the AV representation from categorical emotion labels of speech.

3
24 Nov 2023

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

1,039
21 Nov 2023

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

redmist328/apnet2 20 Nov 2023

APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.

43
20 Nov 2023

ChatGPT in the context of precision agriculture data analytics

potamitis123/chatgpt-in-the-context-of-precision-agriculture-data-analytics 10 Nov 2023

In this work we argue that the speech recognition input modality of ChatGPT provides a more intuitive and natural way for policy makers to interact with the database of the server of an agricultural data processing system to which a large, dispersed network of automated insect traps and sensors probes reports.

2
10 Nov 2023

Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning

c3imaging/child_tts_fastpitch 7 Nov 2023

The approach involved finetuning a multi-speaker TTS model to work with child speech.

2
07 Nov 2023

ArTST: Arabic Text and Speech Transformer

mbzuai-nlp/artst 25 Oct 2023

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.

13
25 Oct 2023