no code implementations • 17 Oct 2024 • José Giraldo, Martí Llopart-Font, Alex Peiró-Lilja, Carme Armentano-Oller, Gerard Sant, Baybars Külebi
High-quality audio data is a critical prerequisite for training robust text-to-speech models, which often limits the use of opportunistic or crowdsourced datasets.