no code implementations • 26 Jan 2023 • Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov
In expressive speech synthesis it is widely adopted to use latent prosody representations to deal with variability of the data during training.
no code implementations • 18 Jun 2022 • Rohit Saha, Mengyi Fang, Angeline Yasodhara, Kyryl Truskovskyi, Azin Asgarian, Daniel Homola, Raahil Shah, Frederik Dieleman, Jack Weatheritt, Thomas Rogers
In this work, we propose a multimodal framework (GaLeNet) for assessing the severity of damage by complementing pre-disaster images with weather data and the trajectory of the hurricane.
no code implementations • 24 Jun 2021 • Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt
In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker.
no code implementations • 11 Nov 2020 • Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba
While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style.