Audio Tagging
41 papers with code • 1 benchmarks • 8 datasets
Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.
Libraries
Use these libraries to find Audio Tagging models and implementationsLatest papers with no code
Multi-encoder attention-based architectures for sound recognition with partial visual assistance
Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries.
Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection
Many state-of-the-art systems for audio tagging and sound event detection employ convolutional recurrent neural architectures.
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
Standard machine learning models for tagging and classifying acoustic signals cannot handle classes that were not seen during training.
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
Our key idea is to share the image modality between bi-modal image-text representations and bi-modal image-audio representations; the image modality functions as a pivot and connects audio and text in a tri-modal embedding space implicitly. In a difficult zero-shot setting with no paired audio-text data, our model demonstrates state-of-the-art zero-shot performance on the ESC50 and US8K audio classification tasks, and even surpasses the supervised state of the art for Clotho caption retrieval (with audio queries) by 2. 2% R@1.
Audiovisual transfer learning for audio tagging and sound event detection
We study the merit of transfer learning for two sound recognition problems, i. e., audio tagging and sound event detection.
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
For the RAVDESS dataset, our system is 3. 3x smaller than the previous best system.
What is the ground truth? Reliability of multi-annotator data for audio tagging
Crowdsourcing has become a common approach for annotating large amounts of data.
Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection
A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously.
Enhancing Audio Augmentation Methods with Consistency Learning
For tasks such as classification, there is a good case for learning representations of the data that are invariant to such transformations, yet this is not explicitly enforced by classification losses such as the cross-entropy loss.
Audio Tagging by Cross Filtering Noisy Labels
Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings.