Audio Tagging
41 papers with code • 1 benchmarks • 8 datasets
Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.
Libraries
Use these libraries to find Audio Tagging models and implementationsMost implemented papers
CRNNs for Urban Sound Tagging with spatiotemporal context
This paper describes CRNNs we used to participate in Task 5 of the DCASE 2020 challenge.
Efficient Training of Audio Transformers with Patchout
However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of . 483 mAP on AudioSet.
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.
Audio classification with Dilated Convolution with Learnable Spacings
Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.
Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling
We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest.
Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data
To use the order information of sound events, we propose sequential labelled data (SLD), where both the presence or absence and the order information of sound events are known.
Guided learning for weakly-labeled semi-supervised sound event detection
Instead of designing a single model by considering a trade-off between the two sub-targets, we design a teacher model aiming at audio tagging to guide a student model aiming at boundary detection to learn using the unlabeled data.
Evaluation of post-processing algorithms for polyphonic sound event detection
We compared post-processing algorithms on the temporal prediction curves of two models: one based on the challenge's baseline and a Multiple Instance Learning (MIL) model.
DCASENET: A joint pre-trained deep neural network for detecting and classifying acoustic scenes and events
Single task deep neural networks that perform a target task among diverse cross-related tasks in the acoustic scene and event literature are being developed.