Search Results for author: Sri Karlapati

Found 6 papers, 0 papers with code

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

no code implementations29 Jun 2022 Peter Makarov, Ammar Abbas, Mateusz Łajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou

In this paper, we examine simple extensions to a Transformer-based FastSpeech-like system, with the goal of improving prosody for multi-sentence TTS.

Language Modelling

Expressive, Variable, and Controllable Duration Modelling in TTS

no code implementations28 Jun 2022 Ammar Abbas, Thomas Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman

First, we propose a duration model conditioned on phrasing that improves the predicted durations and provides better modelling of pauses.

Normalising Flows Speech Synthesis

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

no code implementations27 Jun 2022 Sri Karlapati, Penny Karanasou, Mateusz Lajszczak, Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level between any pair of seen speakers.

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

no code implementations29 Jun 2021 Ammar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangens, Sri Karlapati, Thomas Drugman

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody.

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

no code implementations4 Nov 2020 Sri Karlapati, Ammar Abbas, Zack Hodari, Alexis Moinet, Arnaud Joly, Penny Karanasou, Thomas Drugman

In Stage II, we propose a novel method to sample from this learnt prosodic distribution using the contextual information available in text.

Graph Attention Representation Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.