Expressive Speech Synthesis

Most implemented papers

Exploring Transfer Learning for Low Resource Emotional TTS

Emotional-Text-to-Speech/dl-for-emo-tts Advances in Intelligent Systems and Computing 2019

During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning.

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

syang1993/gst-tacotron ICML 2018

We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody.

Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert

serp-ai/bark-with-voice-clone Social Science Research Network (SSRN) 2023

Keywords: Bark, ai voice cloning, Suno, text-to-speech, artificial intelligence, audio generation, Meta's encodec, audio codebooks, semantic tokens, HuBert, transformer-based model, multilingual speech, wav2vec, linear projection head, embedding space, generative capabilities, pretrained model checkpoints

Robust and fine-grained prosody control of end-to-end speech synthesis

keonlee9420/Robust_Fine_Grained_Prosody_Control 6 Nov 2018

We propose prosody embeddings for emotional and expressive speech synthesis networks.

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

noetits/ICE-Talk 27 Mar 2019

The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques.

Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

numediart/LaughterSynthesis 20 Aug 2020

Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area.

EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels

knoriy/emns-dct 22 May 2023

The increasing adoption of text-to-speech technologies has led to a growing demand for natural and emotive voices that adapt to a conversation's context and emotional tone.

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

0913ktg/sc_vall-e 20 Jul 2023

Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice.

DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training

hsoh0306/diffprosody 31 Jul 2023

Expressive text-to-speech systems have undergone significant advancements owing to prosody modeling, but conventional methods can still be improved.