Audio Barlow Twins: Self-Supervised Audio Representation Learning

28 Sep 2022  ·  Jonah Anton, Harry Coppock, Pancham Shukla, Bjorn W. Schuller ·

The Barlow Twins self-supervised learning objective requires neither negative samples or asymmetric learning updates, achieving results on a par with the current state-of-the-art within Computer Vision. As such, we present Audio Barlow Twins, a novel self-supervised audio representation learning approach, adapting Barlow Twins to the audio domain. We pre-train on the large-scale audio dataset AudioSet, and evaluate the quality of the learnt representations on 18 tasks from the HEAR 2021 Challenge, achieving results which outperform, or otherwise are on a par with, the current state-of-the-art for instance discrimination self-supervised learning approaches to audio representation learning. Code at

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Event Detection DCASE 2016 Task 2 [ABT] AudioNTT Onset FMS 76.1 # 1
Environment Sound Classification ESC-50 [ABT] AudioNTT Accuracy 78.6 # 1
Environmental Sound Classification FSD50K [ABT] AudioNTT mAP 0.474 # 1