Audio Tagging
42 papers with code • 1 benchmarks • 9 datasets
Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.
Libraries
Use these libraries to find Audio Tagging models and implementationsDatasets
Most implemented papers
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
We transfer PANNs to six audio pattern recognition tasks, and demonstrate state-of-the-art performance in several of those tasks.
Speech Denoising with Deep Feature Losses
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.
musicnn: Pre-trained convolutional neural networks for music audio tagging
Pronounced as "musician", the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github. com/jordipons/musicnn.
AST: Audio Spectrogram Transformer
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.
Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging
In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.
Speech Denoising Convolutional Neural Network trained with Deep Feature Losses.
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.
General audio tagging with ensembling convolutional neural network and statistical features
Audio tagging is challenging due to the limited size of data and noisy labels.
Audio tagging with noisy labels and minimal supervision
The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes.