Audio Generation

113 papers with code • 3 benchmarks • 10 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Libraries

Use these libraries to find Audio Generation models and implementations

Most implemented papers

WaveNet: A Generative Model for Raw Audio

ibab/tensorflow-wavenet 12 Sep 2016

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

Adversarial Audio Synthesis

chrisdonahue/wavegan ICLR 2019

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.

GANSynth: Adversarial Neural Audio Synthesis

tensorflow/magenta ICLR 2019

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

It's Raw! Audio Generation with State-Space Models

hazyresearch/state-spaces 20 Feb 2022

SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.

MelNet: A Generative Model for Audio in the Frequency Domain

fatchord/MelNet 4 Jun 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

nvidia/bigvgan 9 Jun 2022

Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.

AudioLM: a Language Modeling Approach to Audio Generation

suno-ai/bark 7 Sep 2022

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

soroushmehr/sampleRNN_ICLR2017 22 Dec 2016

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

Audio Super Resolution using Neural Networks

kuleshov/audio-super-res 2 Aug 2017

We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.

HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement

andreevp/wvmos 24 Mar 2022

Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models.