Prosody Prediction
2 papers with code • 1 benchmarks • 2 datasets
Predicting prosodic prominence from text. This is a 2-way classification task, assigning each word in a sentence a label 1 (prominent) or 0 (non-prominent).
( Image credit: Helsinki Prosody Corpus )
Latest papers with no code
Prosody Analysis of Audiobooks
Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text.
A Comparative Analysis of Pretrained Language Models for Text-to-Speech
In this study, we aim to address this gap by conducting a comparative analysis of different PLMs for two TTS tasks: prosody prediction and pause prediction.
Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units.
What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task.
Ensemble prosody prediction for expressive speech synthesis
Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech.
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesized speech of a target speaker's timbre.
Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit
Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together.
Controllable Sequence-To-Sequence Neural TTS with LPCNET Backend for Real-time Speech Synthesis on CPU
State-of-the-art sequence-to-sequence acoustic networks, that convert a phonetic sequence to a sequence of spectral features with no explicit prosody prediction, generate speech with close to natural quality, when cascaded with neural vocoders, such as Wavenet.
Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features
Prosody affects the naturalness and intelligibility of speech.