Audio Classification

131 papers with code • 20 benchmarks • 34 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Classification

Dataset	Best Model	Compare
AudioSet	OmniVec	See all
ESC-50	InternVideo2	See all
VGGSound	Mirasol3B	See all
ICBHI Respiratory Sound Database	AST (Patch-Mix CL)	See all
SHD	SNN with Dilated Convolution with Learnable Spacings	See all
FSD50K	ONE-PEACE	See all
Speech Commands	AST-S	See all
DCASE	CrissCross (AudioSet)	See all
Balanced Audio Set	BEATs	See all
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Audiovisual, Single)	See all
SSC	SNN with Dilated Convolution with Learnable Spacings	See all
BirdCLEF 2021	EfficientLEAF (8s)	See all
DiCOVA	AUCO ResNet	See all
CREMA-D	EfficientLEAF	See all
RAVDESS	ASM-RH-A	See all
VocalSound	VocalSound Baseline	See all
Multimodal PISA	MMDL	See all
UCR Time Series Classification Archive	CDIL	See all
DEEP-VOICE: DeepFake Voice Recognition	XGBoost (330)	See all
EPIC-SOUNDS	Mirasol3B (A+V)	See all

Show all 20 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Audio Classification models and implementations

Sreyan88/LAPE

3 papers

towhee-io/towhee

2 papers

2,991

google-research/leaf-audio

2 papers

474

fschmid56/efficientat

2 papers

181

See all 7 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

CNN Architectures for Large-Scale Audio Classification

towhee-io/towhee • • 29 Sep 2016

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.

Paper
Code

Perceiver: General Perception with Iterative Attention

deepmind/deepmind-research • • 4 Mar 2021

The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models.

Paper
Code

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

qiuqiangkong/audioset_tagging_cnn • • 23 Aug 2020

We transfer PANNs to six audio pattern recognition tasks, and demonstrate state-of-the-art performance in several of those tasks.

Paper
Code

Multi-level Attention Model for Weakly Supervised Audio Classification

IBM/MAX-Audio-Classifier • • 6 Mar 2018

The objective of audio classification is to predict the presence or absence of audio events in an audio clip.

Paper
Code

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

clovaai/AdamP • • ICLR 2021

Because of the scale invariance, this modification only alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers.

Paper
Code

LEAF: A Learnable Frontend for Audio Classification

google-research/leaf-audio • • 21 Jan 2021

In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification.

Paper
Code

ATST: Audio Representation Learning with Teacher-Student Transformer

Audio-WestlakeU/audiossl • • 26 Apr 2022

Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.

Paper
Code

Masked Autoencoders that Listen

facebookresearch/audiomae • • 13 Jul 2022

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

Paper
Code