Audio Classification

131 papers with code • 23 benchmarks • 34 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Classification

Dataset	Best Model	Compare
AudioSet	OmniVec	See all
ESC-50	InternVideo2	See all
VGGSound	Mirasol3B	See all
ICBHI Respiratory Sound Database	AST (Patch-Mix CL)	See all
SHD	SNN with Dilated Convolution with Learnable Spacings	See all
FSD50K	ONE-PEACE	See all
Speech Commands	AST-S	See all
DCASE	CrissCross (AudioSet)	See all
Balanced Audio Set	BEATs	See all
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Audiovisual, Single)	See all
SSC	SNN with Dilated Convolution with Learnable Spacings	See all
BirdCLEF 2021	EfficientLEAF (8s)	See all
DiCOVA	AUCO ResNet	See all
CREMA-D	EfficientLEAF	See all
RAVDESS	ASM-RH-A	See all
VocalSound	VocalSound Baseline	See all
Multimodal PISA	MMDL	See all
UCR Time Series Classification Archive	CDIL	See all
DEEP-VOICE: DeepFake Voice Recognition	XGBoost (330)	See all
EPIC-SOUNDS	Mirasol3B (A+V)	See all

Show all 20 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Audio Classification models and implementations

Sreyan88/LAPE

3 papers

towhee-io/towhee

2 papers

2,986

google-research/leaf-audio

2 papers

473

fschmid56/efficientat

2 papers

180

See all 7 libraries.

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

no code yet • 9 Nov 2023

We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential.

Paper
Add Code

OmniVec: Learning robust representations with cross modal sharing

no code yet • 7 Nov 2023

We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks.

Paper
Add Code

Explore the Effect of Data Selection on Poison Efficiency in Backdoor Attacks

no code yet • 15 Oct 2023

In this study, we focus on improving the poisoning efficiency of backdoor attacks from the sample selection perspective.

Paper
Add Code

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

no code yet • 12 Oct 2023

In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs.

Paper
Add Code

Diffusion Models as Masked Audio-Video Learners

no code yet • 5 Oct 2023

Over the past several years, the synchronization between audio and visual signals has been leveraged to learn richer audio-visual representations.

Paper
Add Code

Audio Contrastive based Fine-tuning

no code yet • 21 Sep 2023

Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications.

Paper
Add Code

Improving Speech Recognition for African American English With Audio Classification

no code yet • 16 Sep 2023

By combining the classifier output with coarse geographic information, we can select a subset of utterances from a large corpus of untranscribed short-form queries for semi-supervised learning at scale.

Paper
Add Code

Exploring Meta Information for Audio-based Zero-shot Bird Classification

no code yet • 15 Sep 2023

Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research.

Paper
Add Code

Diverse Neural Audio Embeddings -- Bringing Features back !

no code yet • 15 Sep 2023

With the advent of modern AI architectures, a shift has happened towards end-to-end architectures.

Paper
Add Code

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

no code yet • 1 Sep 2023

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.

Paper
Add Code

Audio Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result