no code implementations • 19 Jan 2024 • Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He
A DSP vocoder often gets a lower audio quality due to consuming over-smoothed acoustic model predictions of approximate representations for the vocal tract.
no code implementations • 1 Apr 2021 • Qing He, Zhiping Xiu, Thilo Koehler, JiLong Wu
Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio.
no code implementations • 25 Nov 2020 • Bichen Wu, Qing He, Peizhao Zhang, Thilo Koehler, Kurt Keutzer, Peter Vajda
More efficient variants of FBWave can achieve up to 109x fewer MACs while still delivering acceptable audio quality.
no code implementations • 22 Oct 2019 • Duc Le, Thilo Koehler, Christian Fuegen, Michael L. Seltzer
Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1