no code implementations • 19 Jan 2024 • Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He
A DSP vocoder often gets a lower audio quality due to consuming over-smoothed acoustic model predictions of approximate representations for the vocal tract.
no code implementations • 1 Apr 2021 • Qing He, Zhiping Xiu, Thilo Koehler, JiLong Wu
Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio.