Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions.
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.
We introduce a probabilistic approach to unify deep continual learning with open set recognition, based on variational Bayesian inference.
Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.
In this work, we first describe a CNN based approach for weakly supervised training of audio events.
Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.
Long-range dependencies modeling, widely used in capturing spatiotemporal correlation, has shown to be effective in CNN dominated computer vision tasks.
#14 best model for Object Detection on PASCAL VOC 2007
In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images, a novel audio data modality.
The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.
We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition.