Audio Generation

64 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Generation

Dataset	Best Model	Compare
AudioCaps	Audiobox	See all
Classical music, 5 seconds at 12 kHz	Sparse Transformer 152M (strided)	See all
Symphony music	SymphonyNet	See all

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

WaveNet: A Generative Model for Raw Audio

ibab/tensorflow-wavenet • • 12 Sep 2016

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

Paper
Code

Adversarial Audio Synthesis

chrisdonahue/wavegan • • ICLR 2019

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.

Paper
Code

GANSynth: Adversarial Neural Audio Synthesis

tensorflow/magenta • • ICLR 2019

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

Paper
Code

It's Raw! Audio Generation with State-Space Models

hazyresearch/state-spaces • • 20 Feb 2022

SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.

Paper
Code

MelNet: A Generative Model for Audio in the Frequency Domain

fatchord/MelNet • • 4 Jun 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

Paper
Code

AudioLM: a Language Modeling Approach to Audio Generation

suno-ai/bark • • 7 Sep 2022

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.

Paper
Code

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

soroushmehr/sampleRNN_ICLR2017 • • 22 Dec 2016

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

Paper
Code

Audio Super Resolution using Neural Networks

kuleshov/audio-super-res • • 2 Aug 2017

We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.

Paper
Code

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

acids-ircam/Expressive_WAE_FADER • 12 Apr 2019

Its training data subsets can directly be visualized in the 3D latent representation.

Paper
Code

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

liusongxiang/StarGAN-Voice-Conversion • • NeurIPS 2019

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

Paper
Code

Audio Generation

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result