Audio Classification

131 papers with code • 23 benchmarks • 34 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.


Use these libraries to find Audio Classification models and implementations
3 papers
2 papers
See all 7 libraries.

Most implemented papers

CNN Architectures for Large-Scale Audio Classification

towhee-io/towhee 29 Sep 2016

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.

Perceiver: General Perception with Iterative Attention

deepmind/deepmind-research 4 Mar 2021

The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models.

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

qiuqiangkong/audioset_tagging_cnn 23 Aug 2020

We transfer PANNs to six audio pattern recognition tasks, and demonstrate state-of-the-art performance in several of those tasks.

Multi-level Attention Model for Weakly Supervised Audio Classification

IBM/MAX-Audio-Classifier 6 Mar 2018

The objective of audio classification is to predict the presence or absence of audio events in an audio clip.

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

clovaai/AdamP ICLR 2021

Because of the scale invariance, this modification only alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers.

LEAF: A Learnable Frontend for Audio Classification

google-research/leaf-audio 21 Jan 2021

In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification.

ATST: Audio Representation Learning with Teacher-Student Transformer

Audio-WestlakeU/audiossl 26 Apr 2022

Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.

Masked Autoencoders that Listen

facebookresearch/audiomae 13 Jul 2022

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

pku-yuangroup/languagebind 3 Oct 2023

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

cruvadom/Convolutional-RNN 18 Feb 2016

Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.