Speech Enhancement

214 papers with code • 12 benchmarks • 19 datasets

Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.

( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )

Libraries

Use these libraries to find Speech Enhancement models and implementations
4 papers
484
3 papers
7,764
See all 10 libraries.

Most implemented papers

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Edresson/VoiceSplit 11 Oct 2018

In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

santi-pdp/segan_pytorch 18 Dec 2017

In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

santi-pdp/segan_pytorch 31 Aug 2018

Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.

Improved Speech Enhancement with the Wave-U-Net

craigmacartney/Wave-U-Net-For-Speech-Enhancement 27 Nov 2018

We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment.

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

zhenghuatan/rVAD 9 Jun 2019

In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity.

Spleeter: A Fast And State-of-the Art Music Source Separation Tool With Pre-trained Models

deezer/spleeter ISMIR 2019 Late-Breaking/Demo 2019

We present and release a new tool for music source separation with pre-trained models called Spleeter. Spleeter was designed with ease of use, separation performance and speed in mind.

Real Time Speech Enhancement in the Waveform Domain

facebookresearch/denoiser 23 Jun 2020

The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

speechbrain/speechbrain 8 Apr 2021

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

microsoft/speecht5 ACL 2022

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Speech Denoising Convolutional Neural Network trained with Deep Feature Losses.

francoisgermain/SpeechDenoisingWithDeepFeatureLosses Interspeech 2018

We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.