Audio Tagging

41 papers with code • 1 benchmarks • 8 datasets

Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.

Libraries

Use these libraries to find Audio Tagging models and implementations

Latest papers with no code

ATGNN: Audio Tagging Graph Neural Network

no code yet • 2 Nov 2023

Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging.

Joint Music and Language Attention Models for Zero-shot Music Tagging

no code yet • 16 Oct 2023

However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.

Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

no code yet • 29 Aug 2023

For ATR, we propose using the standard Cross-Entropy loss values obtained for any audio/caption pair.

Compressing audio CNNs with graph centrality based filter pruning

no code yet • 5 May 2023

For large-scale CNNs such as PANNs designed for audio tagging, our method reduces 24\% computations per inference with 41\% fewer parameters at a slight improvement in performance.

Leveraging Audio-Tagging Assisted Sound Event Detection using Weakified Strong Labels and Frequency Dynamic Convolutions

no code yet • 25 Apr 2023

Stage-1 of our proposed framework focuses on audio-tagging (AT), which assists the sound event detection (SED) system in Stage-2.

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

no code yet • 7 Mar 2023

In this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED.

Incremental Learning of Acoustic Scenes and Sound Events

no code yet • 28 Feb 2023

At the same time, its performance on the previous ASC task decreases only by 5. 1 percentage points due to the additional learning of the AT task.

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

no code yet • 14 Dec 2022

In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).

Machine Learning-based Classification of Birds through Birdsong

no code yet • 9 Dec 2022

Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging.

SpectNet : End-to-End Audio Signal Classification Using Learnable Spectrograms

no code yet • 17 Nov 2022

In this paper, we present SpectNet, an integrated front-end layer that extracts spectrogram features within a CNN architecture that can be used for audio pattern recognition tasks.