To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
In this work, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality.
In this paper, we describe in detail the system we submitted to DCASE2019 task 4: sound event detection (SED) in domestic environments.
Instead of designing a single model by considering a trade-off between the two sub-targets, we design a teacher model aiming at audio tagging to guide a student model aiming at boundary detection to learn using the unlabeled data.
In this paper, we focus on two common problems on SED: how to carry out efficient weakly-supervised learning and how to learn better from the unbalanced dataset in which multiple sound events often occur in co-occurrence.
In this work, we first describe a CNN based approach for weakly supervised training of audio events.
As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise.
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs).
The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions.
On the contrary, with our method there is a decrease of 4% at F1 score and an increase of 7% at ER for the TUT-SED Synthetic 2016 dataset.