Audio Classification

48 papers with code • 7 benchmarks • 14 datasets

Audio classification or audio tagging are tasks to predict the tags of audio clips.

Greatest papers with code

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

google-research/google-research NeurIPS 2021

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

 Ranked #1 on Action Classification on Moments in Time (using extra training data)

Action Classification Action Recognition +7

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

rwightman/pytorch-image-models ICLR 2021

Because of the scale invariance, this modification only alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers.

Audio Classification Image Classification +2

Self-Supervised MultiModal Versatile Networks

deepmind/deepmind-research NeurIPS 2020

In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.

Action Recognition In Videos Audio Classification +2

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

PaddlePaddle/PaddleSpeech 23 Aug 2020

We transfer PANNs to six audio pattern recognition tasks, and demonstrate state-of-the-art performance in several of those tasks.

Audio Classification Audio Tagging Sound Audio and Speech Processing

Perceiver: General Perception with Iterative Attention

lucidrains/perceiver-pytorch 4 Mar 2021

The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models.

3D Point Cloud Classification Audio Classification +1

AST: Audio Spectrogram Transformer

YuanGongND/ast 5 Apr 2021

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

 Ranked #1 on Keyword Spotting on Speech Commands (using extra training data)

Audio Classification Audio Tagging +3

CNN Architectures for Large-Scale Audio Classification

harritaylor/torchvggish 29 Sep 2016

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.

Audio Classification Classification +2

Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals

soerenab/AudioMNIST 9 Jul 2018

Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions.

Audio Classification Decision Making +2

Ubicoustics: Plug-and-Play Acoustic Activity Recognition

FIGLAB/ubicoustics 14 Oct 2018

Despite sound being a rich source of information, computing devices with microphones do not leverage audio to glean useful insights about their physical and social context.

Activity Recognition Data Augmentation +2

Multi-level Attention Model for Weakly Supervised Audio Classification

IBM/MAX-Audio-Classifier 6 Mar 2018

The objective of audio classification is to predict the presence or absence of audio events in an audio clip.

Audio Classification Classification