NSynth is a dataset of one shot instrumental notes, containing 305,979 musical notes with unique pitch, timbre and envelope. The sounds were collected from 1006 instruments from commercial sample libraries and are annotated based on their source (acoustic, electronic or synthetic), instrument family and sonic qualities. The instrument families used in the annotation are bass, brass, flute, guitar, keyboard, mallet, organ, reed, string, synth lead and vocal. Four second monophonic 16kHz audio snippets were generated (notes) for the instruments.
121 PAPERS • 3 BENCHMARKS
The MAESTRO dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The MIDI data includes key strike velocities and sustain/sostenuto/una corda pedal positions. Audio and MIDI files are aligned with ∼3 ms accuracy and sliced to individual musical pieces, which are annotated with composer, title, and year of performance. Uncompressed audio is of CD quality or higher (44.1–48 kHz 16-bit PCM stereo).
89 PAPERS • NO BENCHMARKS YET
Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. This dataset is relatively clean and collected for the purpose of training and evaluating visual sound source separation models.
30 PAPERS • NO BENCHMARKS YET
URMP (University of Rochester Multi-Modal Musical Performance) is a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises 44 simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece the dataset provided the musical score in MIDI format, the high-quality individual instrument audio recordings and the videos of the assembled pieces.
POP909 is a dataset which contains multiple versions of the piano arrangements of 909 popular songs created by professional musicians. The main body of the dataset contains the vocal melody, the lead instrument melody, and the piano accompaniment for each song in MIDI format, which are aligned to the original audio files. Furthermore, annotations are provided of tempo, beat, key, and chords, where the tempo curves are hand-labelled and others are done by MIR algorithms.
28 PAPERS • NO BENCHMARKS YET
The Lakh MIDI dataset is a collection of 176,581 unique MIDI files, 45,129 of which have been matched and aligned to entries in the Million Song Dataset. Its goal is to facilitate large-scale music information retrieval, both symbolic (using the MIDI files alone) and audio content-based (using information extracted from the MIDI files as annotations for the matched audio files). Around 10% of all MIDI files include timestamped lyrics events with lyrics are often transcribed at the word, syllable or character level.
27 PAPERS • NO BENCHMARKS YET
MuseData is an electronic library of orchestral and piano classical music from CCARH. It consists of around 3MB of 783 files.
EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators.
14 PAPERS • NO BENCHMARKS YET
The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.
11 PAPERS • NO BENCHMARKS YET
The Lakh Pianoroll Dataset (LPD) is a collection of 174,154 multitrack pianorolls derived from the Lakh MIDI Dataset (LMD).
9 PAPERS • NO BENCHMARKS YET
The Nintendo Entertainment System Music Database (NES-MDB) is a dataset intended for building automatic music composition systems for the NES audio synthesizer. It consists of 5278 songs from the soundtracks of 397 NES games. The dataset represents 296 unique composers, and the songs contain more than two million notes combined. It has file format options for MIDI, score and NLM (NES Language Modeling).
6 PAPERS • NO BENCHMARKS YET
ASAP is a dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical piano music.
4 PAPERS • NO BENCHMARKS YET
The Bach Doodle Dataset is composed of 21.6 million harmonizations submitted from the Bach Doodle. The dataset contains both metadata about the composition (such as the country of origin and feedback), as well as a MIDI of the user-entered melody and a MIDI of the generated harmonization. The dataset contains about 6 years of user entered music.
ComMU has 11,144 MIDI samples that consist of short note sequences created by professional composers with their corresponding 12 metadata. This dataset is designed for a new task, combinatorial music generation which generate diverse and high-quality music only with metadata through auto-regressive language model.
dMelodies is dataset of simple 2-bar melodies generated using 9 independent latent factors of variation where each data point represents a unique melody based on the following constraints: - Each melody will correspond to a unique scale (major, minor, blues, etc.). - Each melody plays the arpeggios using the standard I-IV-V-I cadence chord pattern. - Bar 1 plays the first 2 chords (6 notes), Bar 2 plays the second 2 chords (6 notes). - Each played note is an 8th note.
The MusicBench dataset is a music audio-text pair dataset that was designed for text-to-music generation purpose and released along with Mustango text-to-music model. MusicBench is based on the MusicCaps dataset, which it expands from 5,521 samples to 52,768 training and 400 test samples!
1 PAPER • 1 BENCHMARK