To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
In this work, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality.
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs).
We propose a disentangled feature for weakly supervised multiclass sound event detection (SED), which helps ameliorate the performance and the training efficiency of class-wise attention based detection system by the introduction of more class-wise prior information as well as the network redundancy weight reduction.
As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise.
Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure.