3 dataset results for Semantic Segmentation AND Audio

The Synthesized Lakh (Slakh) Dataset is a dataset for audio source separation that is synthesized from the Lakh MIDI Dataset v0.1 using professional-grade sample-based virtual instruments. This first release of Slakh, called Slakh2100, contains 2100 automatically mixed tracks and accompanying MIDI files synthesized using a professional-grade sampling engine. The tracks in Slakh2100 are split into training (1500 tracks), validation (375 tracks), and test (225 tracks) subsets, totaling 145 hours of mixtures.

24 PAPERS • 2 BENCHMARKS

AVSBench (Audio −Visual Segmentation)

AVSBench is a pixel-level audio-visual segmentation benchmark that provides ground truth labels for sounding objects. The dataset is divided into three subsets: AVSBench-object (Single-source subset, Multi-sources subset) and AVSBench-semantic (Semantic-labels subset). Accordingly, three settings are studied:

10 PAPERS • NO BENCHMARKS YET

Freiburg Terrains

Freiburg Terrains consist of three parts: 3.7 hours of audio recordings of the microphone pointed at the robot wheels. It also contains 24K RGB images from the camera mounted on top of the robot. The dataset creators also provide the SLAM poses for each data collection run. The dataset can be used for terrain classification which is useful for agent navigation tasks.

1 PAPER • NO BENCHMARKS YET

Datasets

3 dataset results for Semantic Segmentation AND Audio