Expressive Speech Synthesis
11 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Expressive Speech Synthesis
Latest papers with no code
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
In this paper, we propose a hierarchical framework to model speaking style from context.
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
This paper presents an expressive speech synthesis architecture for modeling and controlling the speaking style at a word level.
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Secondly, in these models the content/text, prosody, and speaker timbre are usually highly entangled, it's therefore not realistic to expect a satisfied result when freely combining these components, such as to transfer speaking style between speakers.
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
We propose a novel high-fidelity expressive speech synthesis model, UniTTS, that learns and controls overlapping style attributes avoiding interference.
Towards Multi-Scale Style Control for Expressive Speech Synthesis
This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.
Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System
As smart speakers and conversational robots become ubiquitous, the demand for expressive speech synthesis has increased.
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
This framework consists of a multi-grained variational autoencoder, a conditional prior, and a multi-level auto-regressive latent converter to obtain the different time-resolution latent variables and sample the finer-level latent variables from the coarser-level ones by taking into account the input text.
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech
We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second.
The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach
Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem.