Browse > Audio > Audio Generation

Audio Generation

5 papers with code · Audio

Audio generation (synthesis) is the task of generating raw audio such as speech.

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

WaveNet: A Generative Model for Raw Audio

12 Sep 2016buriburisuri/speech-to-text-wavenet

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio.

AUDIO GENERATION SPEECH SYNTHESIS

Adversarial Audio Synthesis

ICLR 2019 chrisdonahue/wavegan

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales. Generative adversarial networks (GANs) have seen wide success at generating images that are both locally and globally coherent, but they have seen little application to audio generation.

AUDIO GENERATION IMAGE GENERATION

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

22 Dec 2016soroushmehr/sampleRNN_ICLR2017

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature.

AUDIO GENERATION

Conditional WaveGAN

27 Sep 2018acheketa/cwavegan

Generative models are successfully used for image synthesis in the recent years. But when it comes to other modalities like audio, text etc little progress has been made.

AUDIO GENERATION

Smoothed Dilated Convolutions for Improved Dense Prediction

27 Aug 2018divelab/dilated

Dilated convolutions, also known as atrous convolutions, have been widely explored in deep convolutional neural networks (DCNNs) for various tasks like semantic image segmentation, object detection, audio generation, video modeling, and machine translation. However, dilated convolutions suffer from the gridding artifacts, which hampers the performance of DCNNs with dilated convolutions.

AUDIO GENERATION MACHINE TRANSLATION OBJECT DETECTION SEMANTIC SEGMENTATION