Audio Classification

131 papers with code • 23 benchmarks • 34 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Libraries

Use these libraries to find Audio Classification models and implementations
3 papers
22
2 papers
2,972
See all 7 libraries.

Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

faceonlive/ai-research 9 Apr 2024

However, existing benchmarks predate the popularization of large multi-modal models, such as CLIP and CLAP.

131
09 Apr 2024

nEMO: Dataset of Emotional Speech in Polish

faceonlive/ai-research 9 Apr 2024

Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems.

131
09 Apr 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

897
22 Mar 2024

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

habla-liaa/encodecmae 14 Feb 2024

APNet allows prototypes' reconstruction to waveforms for interpretability relying on the nearest training data samples.

41
14 Feb 2024

Learning Audio Concepts from Counterfactual Natural Language

ali-vosoughi/counterfactual-audio 10 Jan 2024

Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text.

2
10 Jan 2024

Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification

kaen2891/stethoscope-guided_supervised_contrastive_learning 15 Dec 2023

Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data.

9
15 Dec 2023

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

umbertocappellazzo/petl_ast 6 Dec 2023

The common modus operandi of fine-tuning large pre-trained Transformer models entails the adaptation of all their parameters (i. e., full fine-tuning).

26
06 Dec 2023

Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities

jinhualiang/apt 30 Nov 2023

Moreover, we improve the framework of audio language model by using interleaved audio-text embeddings as the input sequence.

6
30 Nov 2023

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

julirao/whisper_audio_classification 15 Nov 2023

Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings.

2
15 Nov 2023

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

alibaba-damo-academy/FunASR 14 Nov 2023

Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.

3,115
14 Nov 2023