Speech Enhancement
217 papers with code • 12 benchmarks • 19 datasets
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.
( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )
Libraries
Use these libraries to find Speech Enhancement models and implementationsDatasets
Subtasks
Latest papers
FSPEN: AN ULTRA-LIGHTWEIGHT NETWORK FOR REAL TIME SPEECH ENAHNCMENT
Deep learning-based speech enhancement methods have shown promising result in recent years.
How to train your ears: Auditory-model emulation for large-dynamic-range inputs and mild-to-severe hearing losses
Our results show that this new optimization objective significantly improves the emulation performance of deep neural networks across relevant input sound levels and auditory-model frequency channels, without increasing the computational load during inference.
Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks
Studies have shown that in noisy acoustic environments, providing binaural signals to the user of an assistive listening device may improve speech intelligibility and spatial awareness.
Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech
To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced.
Improving Design of Input Condition Invariant Speech Enhancement
In this paper we propose novel architectures to improve the input condition invariant SE model so that performance in simulated conditions remains competitive while real condition degradation is much mitigated.
A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement
Two-stage pipeline is popular in speech enhancement tasks due to its superiority over traditional single-stage methods.
A Refining Underlying Information Framework for Monaural Speech Enhancement
By bridging the speech enhancement and the Information Bottleneck principle in this letter, we rethink a universal plug-and-play strategy and propose a Refining Underlying Information framework called RUI to rise to the challenges both in theory and practice.
D4AM: A General Denoising Framework for Downstream Acoustic Models
To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
TorchAudio is an open-source audio and speech processing library built for PyTorch.
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.