One Billion Audio Sounds from GPU-enabled Modular Synthesis

27 Apr 2021  ·  Joseph Turian, Jordie Shier, George Tzanetakis, Kirk McNally, Max Henry ·

We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU. Finally, we release two new audio datasets: FM synth timbre and subtractive synth pitch. Using these datasets, we demonstrate new rank-based evaluation criteria for existing audio representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.

PDF Abstract

Datasets


Introduced in the Paper:

DX7 Timbre Dataset

Used in the Paper:

NSynth

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here