Audio Synthesis
52 papers with code • 1 benchmarks • 2 datasets
Libraries
Use these libraries to find Audio Synthesis models and implementationsMost implemented papers
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory.
Tacotron: Towards End-to-End Speech Synthesis
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.
Adversarial Audio Synthesis
Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.
Efficient Neural Audio Synthesis
The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.
DiffWave: A Versatile Diffusion Model for Audio Synthesis
In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.
Differentiable All-pole Filters for Time-varying Audio Systems
Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers.
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets.
GANSynth: Adversarial Neural Audio Synthesis
Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.
CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data.
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.