Audio Generation
113 papers with code • 3 benchmarks • 10 datasets
Audio generation (synthesis) is the task of generating raw audio such as speech.
( Image credit: MelNet )
Libraries
Use these libraries to find Audio Generation models and implementationsDatasets
Most implemented papers
WaveNet: A Generative Model for Raw Audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
Adversarial Audio Synthesis
Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.
GANSynth: Adversarial Neural Audio Synthesis
Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.
It's Raw! Audio Generation with State-Space Models
SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.
MelNet: A Generative Model for Audio in the Frequency Domain
Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.
AudioLM: a Language Modeling Approach to Audio Generation
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.
Audio Super Resolution using Neural Networks
We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.
HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement
Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models.