Audio Classification

133 papers with code • 20 benchmarks • 35 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Libraries

Use these libraries to find Audio Classification models and implementations

Most implemented papers

Augmenting Deep Classifiers with Polynomial Neural Networks

grigorisg9gr/polynomials-for-augmenting-nns 16 Apr 2021

The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

google-research/google-research NeurIPS 2021

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

Efficient Training of Audio Transformers with Patchout

kkoutini/passt 11 Oct 2021

However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.

SSAST: Self-Supervised Audio Spectrogram Transformer

YuanGongND/ssast 19 Oct 2021

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification

YuanGongND/ast 13 Mar 2022

Audio classification is an active research area with a wide range of applications.

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer

AlanBaade/MAE-AST-Public 30 Mar 2022

In this paper, we propose a simple yet powerful improvement over the recent Self-Supervised Audio Spectrogram Transformer (SSAST) model for speech and audio classification.

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

fschmid56/efficientat 9 Nov 2022

We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of . 483 mAP on AudioSet.

Audiovisual Masked Autoencoders

google-research/scenic ICCV 2023

Can we leverage the audiovisual information already present in video to improve self-supervised representation learning?

BEATs: Audio Pre-Training with Acoustic Tokenizers

microsoft/unilm 18 Dec 2022

In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

OFA-Sys/ONE-PEACE 18 May 2023

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.