Recent progress in network-based audio event classification has shown the benefit of pre-training models on visual data such as ImageNet.
We show both theoretically and experimentally, the VAE ensemble objective encourages the linear transformations connecting the VAEs to be trivial transformations, aligning the latent representations of different models to be "alike".
We propose a measure to compute class similarity in large-scale classification based on prediction scores.
Recent advancements in audio event classification often ignore the structure and relation between the label classes available as prior information.
Recent advancements in unsupervised disentangled representation learning focus on extending the variational autoencoder (VAE) with an augmented objective function to balance the trade-off between disentanglement and reconstruction.
Cardiac auscultation is the most practiced non-invasive and cost-effective procedure for the early diagnosis of heart diseases.
The weakly labeled framework is used to eliminate the need for expensive data labeling procedure and self-supervised attention is deployed to help a model distinguish between relevant and irrelevant parts of a weakly labeled audio clip in a more effective manner compared to prior attention models.
MIL is a weakly supervised learning problem where labels are associated with groups of instances (referred as bags) instead of individual instances.
In this work, we propose an ensemble of classifiers to distinguish between various degrees of abnormalities of the heart using Phonocardiogram (PCG) signals acquired using digital stethoscopes in a clinical setting, for the INTERSPEECH 2018 Computational Paralinguistics (ComParE) Heart Beats SubChallenge.
In this work, we propound a novel CNN architecture that integrates the front-end bandpass filters within the network using time-convolution (tConv) layers, which enables the FIR filter-bank parameters to become learnable.