49 papers with code • 3 benchmarks • 7 datasets
Audio generation (synthesis) is the task of generating raw audio such as speech.
( Image credit: MelNet )
Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.
In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.
End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.