The JSB chorales are a set of short, four-voice pieces of music well-noted for their stylistic homogeneity. The chorales were originally composed by Johann Sebastian Bach in the 18th century. He wrote them by first taking pre-existing melodies from contemporary Lutheran hymns and then harmonising them to create the parts for the remaining three voices. The version of the dataset used canonically in representation learning contexts consists of 382 such chorales, with a train/validation/test split of 229, 76 and 77 samples respectively.
28 PAPERS • 1 BENCHMARK
The Synthesized Lakh (Slakh) Dataset is a dataset for audio source separation that is synthesized from the Lakh MIDI Dataset v0.1 using professional-grade sample-based virtual instruments. This first release of Slakh, called Slakh2100, contains 2100 automatically mixed tracks and accompanying MIDI files synthesized using a professional-grade sampling engine. The tracks in Slakh2100 are split into training (1500 tracks), validation (375 tracks), and test (225 tracks) subsets, totaling 145 hours of mixtures.
8 PAPERS • 1 BENCHMARK
GiantMIDI-Piano contains 10,854 unique piano solo pieces composed by 2,786 composers. GiantMIDI-Piano contains 34,504,873 transcribed notes, and contains metadata information of each music piece.
4 PAPERS • NO BENCHMARKS YET
The ADL Piano MIDI is a dataset of 11,086 piano pieces from different genres. This dataset is based on the Lakh MIDI dataset, which is a collection on 45,129 unique MIDI files that have been matched to entries in the Million Song Dataset. Most pieces in the Lakh MIDI dataset have multiple instruments, so for each file the authors of ADL Piano MIDI dataset extracted only the tracks with instruments from the "Piano Family" (MIDI program numbers 1-8). This process generated a total of 9,021 unique piano MIDI files. Theses 9,021 files were then combined with other approximately 2,065 files scraped from publicly-available sources on the internet. All the files in the final collection were de-duped according to their MD5 checksum.
3 PAPERS • NO BENCHMARKS YET
EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators.
3 PAPERS • NO BENCHMARKS YET
DadaGP is a new symbolic music dataset comprising 26,181 song scores in the GuitarPro format covering 739 musical genres, along with an accompanying tokenized format well-suited for generative sequence models such as the Transformer. The tokenized format is inspired by event-based MIDI encodings, often used in symbolic music generation models. The dataset is released with an encoder/decoder which converts GuitarPro files to tokens and back.
1 PAPER • NO BENCHMARKS YET