Expressive Speech Synthesis

11 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

no code yet • 23 Mar 2022

In this paper, we propose a hierarchical framework to model speaking style from context.

Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

no code yet • 19 Nov 2021

This paper presents an expressive speech synthesis architecture for modeling and controlling the speaking style at a word level.

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

no code yet • 8 Sep 2021

The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis

no code yet • 27 Jul 2021

Secondly, in these models the content/text, prosody, and speaker timbre are usually highly entangled, it's therefore not realistic to expect a satisfied result when freely combining these components, such as to transfer speaking style between speakers.

UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control

no code yet • 21 Jun 2021

We propose a novel high-fidelity expressive speech synthesis model, UniTTS, that learns and controls overlapping style attributes avoiding interference.

Towards Multi-Scale Style Control for Expressive Speech Synthesis

no code yet • 8 Apr 2021

This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.

Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System

no code yet • COLING 2020

As smart speakers and conversational robots become ubiquitous, the demand for expressive speech synthesis has increased.

Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis

no code yet • 17 Sep 2020

This framework consists of a multi-grained variational autoencoder, a conditional prior, and a multi-level auto-regressive latent converter to obtain the different time-resolution latent variables and sample the finer-level latent variables from the coarser-level ones by taking into account the input text.

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

no code yet • 28 Nov 2019

We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second.

The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach

no code yet • 14 Oct 2019

Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem.