Audio Generation

75 papers with code • 3 benchmarks • 9 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )


Use these libraries to find Audio Generation models and implementations
3 papers

Most implemented papers

WaveNet: A Generative Model for Raw Audio

ibab/tensorflow-wavenet 12 Sep 2016

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

Adversarial Audio Synthesis

chrisdonahue/wavegan ICLR 2019

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.

GANSynth: Adversarial Neural Audio Synthesis

tensorflow/magenta ICLR 2019

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

It's Raw! Audio Generation with State-Space Models

hazyresearch/state-spaces 20 Feb 2022

SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.

MelNet: A Generative Model for Audio in the Frequency Domain

fatchord/MelNet 4 Jun 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

AudioLM: a Language Modeling Approach to Audio Generation

suno-ai/bark 7 Sep 2022

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

soroushmehr/sampleRNN_ICLR2017 22 Dec 2016

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

Audio Super Resolution using Neural Networks

kuleshov/audio-super-res 2 Aug 2017

We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.

High-Fidelity Audio Compression with Improved RVQGAN

descriptinc/descript-audio-codec NeurIPS 2023

Language models have been successfully used to model natural signals, such as images, speech, and music.

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

acids-ircam/Expressive_WAE_FADER 12 Apr 2019

Its training data subsets can directly be visualized in the 3D latent representation.