Audio Classification

131 papers with code • 23 benchmarks • 34 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Libraries

Use these libraries to find Audio Classification models and implementations
3 papers
22
2 papers
2,986
See all 7 libraries.

Latest papers with no code

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

no code yet • 9 Nov 2023

We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential.

OmniVec: Learning robust representations with cross modal sharing

no code yet • 7 Nov 2023

We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks.

Explore the Effect of Data Selection on Poison Efficiency in Backdoor Attacks

no code yet • 15 Oct 2023

In this study, we focus on improving the poisoning efficiency of backdoor attacks from the sample selection perspective.

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

no code yet • 12 Oct 2023

In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs.

Diffusion Models as Masked Audio-Video Learners

no code yet • 5 Oct 2023

Over the past several years, the synchronization between audio and visual signals has been leveraged to learn richer audio-visual representations.

Audio Contrastive based Fine-tuning

no code yet • 21 Sep 2023

Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications.

Improving Speech Recognition for African American English With Audio Classification

no code yet • 16 Sep 2023

By combining the classifier output with coarse geographic information, we can select a subset of utterances from a large corpus of untranscribed short-form queries for semi-supervised learning at scale.

Exploring Meta Information for Audio-based Zero-shot Bird Classification

no code yet • 15 Sep 2023

Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research.

Diverse Neural Audio Embeddings -- Bringing Features back !

no code yet • 15 Sep 2023

With the advent of modern AI architectures, a shift has happened towards end-to-end architectures.

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

no code yet • 1 Sep 2023

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.