no code implementations • 22 Dec 2023 • Piotr Bilinski, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa
As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities.
no code implementations • 15 Sep 2023 • Dariusz Piotrowski, Renard Korzeniowski, Alessio Falai, Sebastian Cygert, Kamil Pokora, Georgi Tinchev, Ziyao Zhang, Kayoko Yanagisawa
In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker.
no code implementations • 31 Jul 2023 • Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.
no code implementations • 11 Jan 2023 • Georgi Tinchev, Marta Czarnowska, Kamil Deja, Kayoko Yanagisawa, Marius Cotescu
Prior work on modelling accents assumes a phonetic transcription is available for the target accent, which might not be the case for low-resource, regional accents.
no code implementations • 10 Nov 2022 • Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa
Regional accents of the same language affect not only how words are pronounced (i. e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation.
no code implementations • 4 Jul 2022 • Ariadna Sanchez, Alessio Falai, Ziyao Zhang, Orazio Angelini, Kayoko Yanagisawa
In this paper, we conduct a comprehensive study comparing multilingual NTTS systems models trained with both representations.
no code implementations • 4 Jul 2022 • Ziyao Zhang, Alessio Falai, Ariadna Sanchez, Orazio Angelini, Kayoko Yanagisawa
Training multilingual Neural Text-To-Speech (NTTS) models using only monolingual corpora has emerged as a popular way for building voice cloning based Polyglot NTTS systems.
no code implementations • 12 Dec 2019 • Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman
We present UTACO, a singing synthesis model based on an attention-based sequence-to-sequence mechanism and a vocoder based on dilated causal convolutions.