Audio Generation

20 papers with code • 1 benchmarks • 6 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Greatest papers with code

GANSynth: Adversarial Neural Audio Synthesis

tensorflow/magenta ICLR 2019

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

Audio Generation

WaveNet: A Generative Model for Raw Audio

ibab/tensorflow-wavenet 12 Sep 2016

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

Audio Generation Speech Synthesis

Music Source Separation in the Waveform Domain

facebookresearch/demucs 27 Nov 2019

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song.

Audio Generation Data Augmentation +3

DDSP: Differentiable Digital Signal Processing

magenta/ddsp ICLR 2020

In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.

Audio Generation

Adversarial Audio Synthesis

chrisdonahue/wavegan ICLR 2019

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.

Audio Generation Image Generation

Audio Super Resolution using Neural Networks

kuleshov/audio-super-res 2 Aug 2017

We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.

Audio Super-Resolution

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

soroushmehr/sampleRNN_ICLR2017 22 Dec 2016

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

Audio Generation Hierarchical structure

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

liusongxiang/StarGAN-Voice-Conversion NeurIPS 2019

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

Audio Generation Voice Conversion

MelNet: A Generative Model for Audio in the Frequency Domain

fatchord/MelNet 4 Jun 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

Audio Generation Music Generation +2

Conditional WaveGAN

acheketa/cwavegan 27 Sep 2018

Generative models are successfully used for image synthesis in the recent years.

Audio Generation