Audio Tagging
41 papers with code • 1 benchmarks • 8 datasets
Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.
Libraries
Use these libraries to find Audio Tagging models and implementationsLatest papers with no code
ATGNN: Audio Tagging Graph Neural Network
Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging.
Joint Music and Language Attention Models for Zero-shot Music Tagging
However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.
Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?
For ATR, we propose using the standard Cross-Entropy loss values obtained for any audio/caption pair.
Compressing audio CNNs with graph centrality based filter pruning
For large-scale CNNs such as PANNs designed for audio tagging, our method reduces 24\% computations per inference with 41\% fewer parameters at a slight improvement in performance.
Leveraging Audio-Tagging Assisted Sound Event Detection using Weakified Strong Labels and Frequency Dynamic Convolutions
Stage-1 of our proposed framework focuses on audio-tagging (AT), which assists the sound event detection (SED) system in Stage-2.
AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer
In this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED.
Incremental Learning of Acoustic Scenes and Sound Events
At the same time, its performance on the previous ASC task decreases only by 5. 1 percentage points due to the additional learning of the AT task.
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).
Machine Learning-based Classification of Birds through Birdsong
Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging.
SpectNet : End-to-End Audio Signal Classification Using Learnable Spectrograms
In this paper, we present SpectNet, an integrated front-end layer that extracts spectrogram features within a CNN architecture that can be used for audio pattern recognition tasks.