Audio Super-Resolution
16 papers with code • 4 benchmarks • 3 datasets
Audio super-resolution, especially speech, refers to the process of reconstructing high-resolution audio signals from their low-resolution counterparts. Essentially, it enhances the quality of a speech signal by increasing its sampling rate or bandwidth while preserving naturalness and intelligibility. A representative Github project for speech super-resolution is ClearerVoice-Studio.
Most implemented papers
NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates.
Audio Super Resolution using Neural Networks
We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz.
On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks
In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension.
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Rather than focusing exclusively on the speech denoising task, we extend this work to address the dereverberation and super-resolution tasks.
Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations
Learning representations that accurately capture long-range dependencies in sequential inputs -- including text, audio, and genomic data -- is a key problem in deep learning.
Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations.
Learning representations that accurately capture long-range dependencies in sequential inputs --- including text, audio, and genomic data --- is a key problem in deep learning.
Self-Attention for Audio Super-Resolution
Convolutions operate only locally, thus failing to model global interactions.
TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining
We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension.
Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution
To obtain a continuous representation of audio and enable super resolution for arbitrary scale factor, we propose a method of implicit neural representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA).