Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions.
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.
We introduce a unified probabilistic approach for deep continual learning based on variational Bayesian inference with open set recognition.
Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.
In this work, we first describe a CNN based approach for weakly supervised training of audio events.
Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.
Long-range dependencies modeling, widely used in capturing spatiotemporal correlation, has shown to be effective in CNN dominated computer vision tasks.
#14 best model for Object Detection on PASCAL VOC 2007
In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and a novel audio data modality, namely acoustic images.
We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition.
Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms.