no code implementations • 26 Jan 2023 • Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov
In expressive speech synthesis it is widely adopted to use latent prosody representations to deal with variability of the data during training.
no code implementations • 10 Nov 2022 • Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa
Regional accents of the same language affect not only how words are pronounced (i. e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation.
no code implementations • 15 Mar 2022 • Thomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa
We investigate normalising flows for VC in both text-conditioned and text-free scenarios.
no code implementations • 13 Aug 2021 • Abdelhamid Ezzerg, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba, Viacheslav Klimkov
Artificial speech synthesis has made a great leap in terms of naturalness as recent Text-to-Speech (TTS) systems are capable of producing speech with similar quality to human recordings.
no code implementations • 24 Jun 2021 • Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt
In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker.