Real-time Speech Enhancement on Raw Signals with Deep State-space Modeling

5 Sep 2024  ·  Yan Ru Pei, Ritik Shrivastava, FNU Sidharth ·

We present aTENNuate, a simple deep state-space autoencoder configured for efficient online raw speech enhancement in an end-to-end fashion. The network's performance is primarily evaluated on raw speech denoising, with additional assessments on tasks such as super-resolution and de-quantization. We benchmark aTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets. The network outperforms previous real-time denoising models in terms of PESQ score, parameter count, MACs, and latency. Even as a raw waveform processing model, the model maintains high fidelity to the clean signal with minimal audible artifacts. In addition, the model remains performant even when the noisy input is compressed down to 4000Hz and 4 bits, suggesting general speech enhancement capabilities in low-resource environments. Code is available at github.com/Brainchip-Inc/aTENNuate

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Speech Enhancement Deep Noise Suppression (DNS) Challenge aTENNuate PESQ-WB 2.98 # 7
Speech Enhancement VoiceBank + DEMAND aTENNuate PESQ 3.27 # 12
CSIG 4.57 # 9
CBAK 2.85 # 28
COVL 3.96 # 8
PESQ-WB 3.27 # 2
SI-SDR 15.04 # 1

Methods