Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling

11 Jul 2016Lars HertelHuy PhanAlfred Mertins

We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest. Our network achieved an average accuracy of 84.5% on the four-fold cross-validation for acoustic scene recognition, compared to the provided baseline of 72.5%, and an average equal error rate of 0.17 for domestic audio tagging, compared to the baseline of 0.21... (read more)

