Audio Tagging

41 papers with code • 1 benchmarks • 8 datasets

Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Tagging

Trend	Dataset	Best Model	Paper	Code	Compare
	AudioSet	CAV-MAE (Audio-Visual)			See all

Libraries

Use these libraries to find Audio Tagging models and implementations

fschmid56/efficientat

2 papers

184

Datasets

Latest papers

Most implemented Social Latest No code

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

nttcslab/m2d • • 26 Oct 2022

We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.

26 Oct 2022

Paper
Code

Contrastive Audio-Visual Masked Autoencoder

yuangongnd/cav-mae • • 2 Oct 2022

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

203

02 Oct 2022

Paper
Code

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

zhaoyanpeng/vipant • • NAACL 2022

In a difficult zero-shot setting with no paired audio-text data, our model demonstrates state-of-the-art zero-shot performance on the ESC50 and US8K audio classification tasks, and even surpasses the supervised state of the art for Clotho caption retrieval (with audio queries) by 2. 2\% R@1.

16 Dec 2021

Paper
Code

Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

RetroCirce/Zero_Shot_Audio_Source_Separation • • 15 Dec 2021

Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.

166

15 Dec 2021

Paper
Code

Efficient Training of Audio Transformers with Patchout

kkoutini/passt • • 11 Oct 2021

However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.

279

11 Oct 2021

Paper
Code

Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection

965694547/hybrid-system-of-frame-wise-model-and-sedt • • 5 Oct 2021

A critical issue with the frame-based model is that it pursues the best frame-level prediction rather than the best event-level prediction.

05 Oct 2021

Paper
Code

Weakly-Supervised Classification and Detection of Bird Sounds in the Wild.

kumar-shubham-ml/kaggle-birdclef-2021 • • CLEF 2021

It is easier to hear birds than see them, however, they still play an essential role in nature and they are excellent indicators of deteriorating environmental quality and pollution.

10 Jul 2021

Paper
Code

THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING

wsntxxn/AudioCaption • • DCASE Challenge 2021

This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.

06 Jul 2021

Paper
Code

Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures

JHU-LCAP/CRSTmodel • • 27 May 2021

Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording.

27 May 2021

Paper
Code

A Modulation Front-End for Music Audio Tagging

rastegah/modnet • 25 May 2021

Modulation filter bank representations that have been actively researched as a basis for timbre perception have the potential to facilitate the extraction of perceptually salient features.

25 May 2021

Paper
Code

Audio Tagging

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result