Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory.
In this paper we compare different types of recurrent units in recurrent neural networks (RNNs).
Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.
Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales.
Our goal is to be able to build a generative model from a deep neural network architecture to try to create music that has both harmony and melody and is passable as music composed by humans.