Audio Tagging
41 papers with code • 1 benchmarks • 8 datasets
Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.
Libraries
Use these libraries to find Audio Tagging models and implementationsLatest papers
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.
Contrastive Audio-Visual Masked Autoencoder
In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
In a difficult zero-shot setting with no paired audio-text data, our model demonstrates state-of-the-art zero-shot performance on the ESC50 and US8K audio classification tasks, and even surpasses the supervised state of the art for Clotho caption retrieval (with audio queries) by 2. 2\% R@1.
Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
Efficient Training of Audio Transformers with Patchout
However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.
Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection
A critical issue with the frame-based model is that it pursues the best frame-level prediction rather than the best event-level prediction.
Weakly-Supervised Classification and Detection of Bird Sounds in the Wild.
It is easier to hear birds than see them, however, they still play an essential role in nature and they are excellent indicators of deteriorating environmental quality and pollution.
THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING
This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures
Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording.
A Modulation Front-End for Music Audio Tagging
Modulation filter bank representations that have been actively researched as a basis for timbre perception have the potential to facilitate the extraction of perceptually salient features.