The MAESTRO dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The MIDI data includes key strike velocities and sustain/sostenuto/una corda pedal positions. Audio and MIDI files are aligned with ∼3 ms accuracy and sliced to individual musical pieces, which are annotated with composer, title, and year of performance. Uncompressed audio is of CD quality or higher (44.1–48 kHz 16-bit PCM stereo).
110 PAPERS • 1 BENCHMARK
MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note's position in the metrical structure of the composition. The labels are acquired from musical scores aligned to recordings by dynamic time warping. The labels are verified by trained musicians; we estimate a labeling error rate of 4%. We offer the MusicNet labels to the machine learning and music communities as a resource for training models and a common benchmark for comparing results.
42 PAPERS • 1 BENCHMARK
URMP (University of Rochester Multi-Modal Musical Performance) is a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises 44 simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece the dataset provided the musical score in MIDI format, the high-quality individual instrument audio recordings and the videos of the assembled pieces.
37 PAPERS • 2 BENCHMARKS
The Synthesized Lakh (Slakh) Dataset is a dataset for audio source separation that is synthesized from the Lakh MIDI Dataset v0.1 using professional-grade sample-based virtual instruments. This first release of Slakh, called Slakh2100, contains 2100 automatically mixed tracks and accompanying MIDI files synthesized using a professional-grade sampling engine. The tracks in Slakh2100 are split into training (1500 tracks), validation (375 tracks), and test (225 tracks) subsets, totaling 145 hours of mixtures.
35 PAPERS • 3 BENCHMARKS
Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. This dataset is relatively clean and collected for the purpose of training and evaluating visual sound source separation models.
33 PAPERS • NO BENCHMARKS YET
ASAP is a dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical piano music.
12 PAPERS • 2 BENCHMARKS
The CocoChorales Dataset CocoChorales is a dataset consisting of over 1400 hours of audio mixtures containing four-part chorales performed by 13 instruments, all synthesized with realistic-sounding generative models. CocoChorales contains mixes, sources, and MIDI data, as well as annotations for note expression (e.g., per-note volume and vibrato) and synthesis parameters (e.g., multi-f0).
7 PAPERS • NO BENCHMARKS YET
MAPS – standing for MIDI Aligned Piano Sounds – is a database of MIDI-annotated piano recordings. MAPS has been designed in order to be released in the music information retrieval research community, especially for the development and the evaluation of algorithms for single-pitch or multipitch estimation and automatic transcription of music. It is composed by isolated notes, random-pitch chords, usual musical chords and pieces of music. The database provides a large amount of sounds obtained in various recording conditions.
6 PAPERS • 1 BENCHMARK
A MIDI dataset of 500 4-part chorales generated by the KS_Chorus algorithm, annotated with results from hundreds of listening test participants, with 500 further unannotated chorales.
3 PAPERS • NO BENCHMARKS YET
This dataset contains transcriptions of the electric guitar performance of 240 tablatures, rendered with different tones. The goal is to contribute to automatic music transcription (AMT) of guitar music, a technically challenging task.
1 PAPER • NO BENCHMARKS YET
This dataset is an audio dataset containing about 1500 audio clips recorded by multiple professional players.
Guitar-TECHS is a comprehensive dataset featuring a variety of guitar techniques, musical excerpts, chords, and scales. These elements are performed by diverse musicians across various recording settings. Guitar-TECHS incorporates recordings from two stereo microphones: an egocentric microphone positioned on the performer’s head and an exocentric microphone placed in front of the performer. It also includes direct input recordings and microphoned amplifier outputs, offering a wide spectrum of audio inputs and recording qualities. All signals and MIDI labels are properly synchronized. Its multi-perspective and multi-modal content makes Guitar-TECHS a valuable resource for advancing data-driven guitar research, and to develop robust guitar listening algorithms.
We redistribute a suite of datasets as part of the YourMT3 project. The license for redistribution is attached.
This publicly available data is synthesised audio for woodwind quartets including renderings of each instrument in isolation. The data was created to be used as training data within Cadenza's second open machine learning challenge (CAD2) for the task on rebalancing classical music ensembles. The dataset is also intended for developing other music information retrieval (MIR) algorithms using machine learning. It was created because of the lack of large-scale datasets of classical woodwind music with separate audio for each instrument and permissive license for reuse. Music scores were selected from the OpenScore String Quartet corpus. These were rendered for two woodwind ensembles of (i) flute, oboe, clarinet and bassoon; and (ii) flute, oboe, alto saxophone and bassoon. This was done by a professional music producer using industry-standard software. Virtual instruments were used to create the audio for each instrument using software that interpreted expression markings in the score. Co
0 PAPER • NO BENCHMARKS YET