no code implementations • 19 Nov 2021 • Myrsini Christidou, Alexandra Vioni, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Panos Kakoulidis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis
This paper presents a method for phoneme-level prosody control of F0 and duration on a multispeaker text-to-speech setup, which is based on prosodic clustering.
no code implementations • 19 Nov 2021 • Konstantinos Klapsas, Nikolaos Ellinas, June Sig Sung, Hyoungmin Park, Spyros Raptis
This paper presents an expressive speech synthesis architecture for modeling and controlling the speaking style at a word level.
no code implementations • 19 Nov 2021 • Alexandra Vioni, Myrsini Christidou, Nikolaos Ellinas, Georgios Vamvoukakis, Panos Kakoulidis, TaeHoon Kim, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis
This paper presents a method for controlling the prosody at the phoneme level in an autoregressive attention-based text-to-speech system.
no code implementations • 17 Nov 2021 • Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Aimilios Chalamandaris, Georgia Maniati, Panos Kakoulidis, Spyros Raptis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis
This paper presents an end-to-end text-to-speech system with low latency on a CPU, suitable for real-time applications.
no code implementations • 17 Nov 2021 • Georgia Maniati, Nikolaos Ellinas, Konstantinos Markopoulos, Georgios Vamvoukakis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis
Subsequently, we fine-tune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity.
no code implementations • 17 Nov 2021 • Konstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, Georgia Maniati, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris
In this paper, a text-to-rapping/singing system is introduced, which can be adapted to any speaker's voice.