Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra. It consists mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more.
119 PAPERS • 2 BENCHMARKS
Human-Animal-Cartoon (HAC) dataset consists of seven actions (‘sleeping’, ‘watching tv’, ‘eating’, ‘drinking’, ‘swimming’, ‘running’, and ‘opening door’) performed by humans, animals, and cartoon figures, forming three different domains. 3381 video clips are collected from the internet with around 1000 for each domain and three modalities are provided in the dataset: video, audio, and optical flow.
2 PAPERS • NO BENCHMARKS YET