Audio Generation

34 papers with code • 2 benchmarks • 6 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Most implemented papers

WaveNet: A Generative Model for Raw Audio

ibab/tensorflow-wavenet 12 Sep 2016

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

Adversarial Audio Synthesis

chrisdonahue/wavegan ICLR 2019

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.

MelNet: A Generative Model for Audio in the Frequency Domain

fatchord/MelNet 4 Jun 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

Audio Super Resolution using Neural Networks

kuleshov/audio-super-res 2 Aug 2017

We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.

GANSynth: Adversarial Neural Audio Synthesis

tensorflow/magenta ICLR 2019

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

soroushmehr/sampleRNN_ICLR2017 22 Dec 2016

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

liusongxiang/StarGAN-Voice-Conversion NeurIPS 2019

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

DDSP: Differentiable Digital Signal Processing

magenta/ddsp ICLR 2020

In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.

Differentiable Time-Frequency Scattering on GPU

cyrusvahidi/kymatio-wavespin 18 Apr 2022

Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales.

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

acids-ircam/Expressive_WAE_FADER 12 Apr 2019

Its training data subsets can directly be visualized in the 3D latent representation.