Emotional Speech Synthesis
3 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Emotional Speech Synthesis
Latest papers with no code
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels.
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis
Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected.
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc.
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.
Speech Synthesis with Mixed Emotions
We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
Speech synthesis is used in a wide variety of industries.
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
Finally, by showing a comparable performance in the emotional speech synthesis task, we successfully demonstrate the ability of the proposed model.
Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System
As smart speakers and conversational robots become ubiquitous, the demand for expressive speech synthesis has increased.
Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition
One of the model’s weaknesses is that it cannot consider the statistics of speech features, which are known to be effective for speech emotion recognition.
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
Objective and subjective evaluation results show that our model outperforms the conventional Tacotron model for ESS when only 5\% of training data has emotion labels.