Audio Tagging

41 papers with code • 1 benchmarks • 8 datasets

Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.

Libraries

Use these libraries to find Audio Tagging models and implementations

Perceptual Musical Features for Interpretable Audio Tagging

vaslyb/perceptible-music-tagging 18 Dec 2023

In the age of music streaming platforms, the task of automatically tagging music audio has garnered significant attention, driving researchers to devise methods aimed at enhancing performance metrics on standard datasets.

5
18 Dec 2023

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

fschmid56/efficientat 24 Oct 2023

Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.

181
24 Oct 2023

Audio classification with Dilated Convolution with Learnable Spacings

k-h-ismail/dilated-convolution-with-learnable-spacings-pytorch 25 Sep 2023

Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.

50
25 Sep 2023

Audio Tagging on an Embedded Hardware Platform

gbibbo/ai4s-embedded 15 Jun 2023

In this paper, we analyze how the performance of large-scale pretrained audio neural networks designed for audio pattern recognition changes when deployed on a hardware such as Raspberry Pi.

8
15 Jun 2023

Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

audio-westlakeu/audiossl 7 Jun 2023

In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.

65
07 Jun 2023

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

arshdeep-singh-boparai/e-panns 30 May 2023

Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking.

11
30 May 2023

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

gewu-lab/cross-modal-distillation 16 Apr 2023

However, such semantic consistency from the synchronization is hard to guarantee in unconstrained videos, due to the irrelevant modality noise and differentiated semantic correlation.

5
16 Apr 2023

Zorro: the masked multimodal transformer

lucidrains/zorro-pytorch 23 Jan 2023

Attention-based models are appealing for multimodal processing because inputs from multiple modalities can be concatenated and fed to a single backbone network - thus requiring very little fusion engineering.

92
23 Jan 2023

Ontology-aware Learning and Evaluation for Audio Tagging

haoheliu/ontology-aware-audio-tagging 22 Nov 2022

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

13
22 Nov 2022

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

fschmid56/efficientat 9 Nov 2022

We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of . 483 mAP on AudioSet.

181
09 Nov 2022