Audio Classification

132 papers with code • 20 benchmarks • 35 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Classification

Dataset	Best Model	Compare
AudioSet	OmniVec	See all
ESC-50	InternVideo2	See all
VGGSound	Mirasol3B	See all
ICBHI Respiratory Sound Database	AST (Patch-Mix CL)	See all
SHD	Event-SSM	See all
FSD50K	ONE-PEACE	See all
Speech Commands	AST-S	See all
DCASE	CrissCross (AudioSet)	See all
Balanced Audio Set	BEATs	See all
SSC	Event-SSM	See all
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Audiovisual, Single)	See all
BirdCLEF 2021	EfficientLEAF (8s)	See all
DiCOVA	AUCO ResNet	See all
CREMA-D	EfficientLEAF	See all
RAVDESS	ASM-RH-A	See all
VocalSound	VocalSound Baseline	See all
Multimodal PISA	MMDL	See all
UCR Time Series Classification Archive	CDIL	See all
DEEP-VOICE: DeepFake Voice Recognition	XGBoost (330)	See all
EPIC-SOUNDS	Mirasol3B (A+V)	See all

Show all 20 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Audio Classification models and implementations

Sreyan88/LAPE

3 papers

towhee-io/towhee

2 papers

3,000

google-research/leaf-audio

2 papers

474

fschmid56/efficientat

2 papers

183

See all 7 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

alibaba-damo-academy/FunASR • • 14 Nov 2023

Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.

3,370

14 Nov 2023

Paper
Code

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound • • 11 Nov 2023

In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder.

11 Nov 2023

Paper
Code

Auto deep learning for bioacoustic signals

giuliotosato/autokeras-bioacustic • • 8 Nov 2023

This study investigates the potential of automated deep learning to enhance the accuracy and efficiency of multi-class classification of bird vocalizations, compared against traditional manually-designed deep learning models.

08 Nov 2023

Paper
Code

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

fschmid56/efficientat • • 24 Oct 2023

Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.

183

24 Oct 2023

Paper
Code

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

knoriy/CLARA • • 18 Oct 2023

Using a large multilingual audio corpus and self-supervised learning, CLARA develops speech representations enriched with emotions, advancing emotion-aware multilingual speech processing.

18 Oct 2023

Paper
Code

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

PKU-YuanGroup/Video-LLaVA • • 3 Oct 2023

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

2,413

03 Oct 2023

Paper
Code

Audio classification with Dilated Convolution with Learnable Spacings

k-h-ismail/dilated-convolution-with-learnable-spacings-pytorch • • 25 Sep 2023

Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.

25 Sep 2023

Paper
Code