8 dataset results for Audio Tagging

Audioset is an audio event dataset, which consists of over 2M human-annotated 10-second video clips. These clips are collected from YouTube, therefore many of which are in poor-quality and contain multiple sound-sources. A hierarchical ontology of 632 event classes is employed to annotate these data, which means that the same sound could be annotated as different labels. For example, the sound of barking is annotated as Animal, Pets, and Dog. All the videos are split into Evaluation/Balanced-Train/Unbalanced-Train set.

588 PAPERS • 6 BENCHMARKS

FSDKaggle2018

FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2. All audio samples are gathered from Freesound and are provided as uncompressed PCM 16 bit, 44.1 kHz mono audio files. The 41 categories of the AudioSet Ontology are: "Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".

11 PAPERS • 1 BENCHMARK

VocalSound

VocalSound is a free dataset consisting of 21,024 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. The VocalSound dataset also contains meta-information such as speaker age, gender, native language, country, and health condition.

10 PAPERS • 1 BENCHMARK

SSC (Spiking Speech Commands v0.2)

The SSC dataset is a spiking version of the Speech Commands dataset release by Google (Speech Commands). SSC was generated using Lauscher, an artificial cochlea model. The SSC dataset consists of utterances recorded from a larger number of speakers under controlled conditions. Spikes were generated in 700 input channels, and it contains 35 word categories from a large number of speakers.

6 PAPERS • 1 BENCHMARK

FSDKaggle2019

FSDKaggle2019 is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology. FSDKaggle2019 has been used for the DCASE Challenge 2019 Task 2, which was run as a Kaggle competition titled Freesound Audio Tagging 2019. The dataset allows development and evaluation of machine listening methods in conditions of label noise, minimal supervision, and real-world acoustic mismatch. FSDKaggle2019 consists of two train sets and one test set. One train set and the test set consists of manually-labeled data from Freesound, while the other train set consists of noisily labeled web audio data from Flickr videos taken from the YFCC dataset. The curated train set consists of manually labeled data from FSD: 4970 total clips with a total duration of 10.5 hours. The noisy train set has 19,815 clips with a total duration of 80 hours. The test set has 4481 clips with a total duration of 12.9 hours.

5 PAPERS • NO BENCHMARKS YET

CHiME-Home

CHiME-Home is a dataset for sound source recognition in a domestic environment. It uses around 6.8 hours of domestic environment audio recordings. The recordings were obtained from the CHiME projects – computational hearing in multisource environments – where recording equipment was positioned inside an English Victorian semi-detached house. The recordings were selected from 22 sessions totalling 19.5 hours, with each session made between 7:30 in the morning and 20:00 in the evening. In the considered recordings, the equipment was placed in the lounge (sitting room) near the door opening onto a hallway, with the hallway opening onto a kitchen with no door. With the lounge door typically open, prominent sounds thus may originate from sources both in the lounge and kitchen.

4 PAPERS • NO BENCHMARKS YET

SONYC-UST-V2

A dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urban recordings are available, this dataset provides the opportunity to investigate how spatiotemporal metadata can aid in the prediction of urban sound tags. SONYC-UST-V2 consists of 18510 audio recordings from the "Sounds of New York City" (SONYC) acoustic sensor network, including the timestamp of audio acquisition and location of the sensor.

4 PAPERS • NO BENCHMARKS YET

SINGA:PURA

SINGA:PURA (SINGApore: Polyphonic URban Audio)

This repository contains the SINGA:PURA dataset, a strongly-labelled polyphonic urban sound dataset with spatiotemporal context. The data were collected via a number of recording units deployed across Singapore as a part of a wireless acoustic sensor network. These recordings were made as part of a project to identify and mitigate noise sources in Singapore, but also possess a wider applicability to sound event detection, classification, and localization. The taxonomy we used for the labels in this dataset has been designed to be compatible with other existing datasets for urban sound tagging while also able to capture sound events unique to the Singaporean context. Please refer to our conference paper published in APSIPA 2021 (which is found in this repository as the file "APSIPA.pdf") or download the readme ("Readme.md") for more details regarding the data collection, annotation, and processing methodologies for the creation of the dataset.

2 PAPERS • NO BENCHMARKS YET

Datasets

8 dataset results for Audio Tagging