Audio Barlow Twins: Self-Supervised Audio Representation Learning

28 Sep 2022  ·  Jonah Anton, Harry Coppock, Pancham Shukla, Bjorn W. Schuller ·

The Barlow Twins self-supervised learning objective requires neither negative samples or asymmetric learning updates, achieving results on a par with the current state-of-the-art within Computer Vision. As such, we present Audio Barlow Twins, a novel self-supervised audio representation learning approach, adapting Barlow Twins to the audio domain. We pre-train on the large-scale audio dataset AudioSet, and evaluate the quality of the learnt representations on 18 tasks from the HEAR 2021 Challenge, achieving results which outperform, or otherwise are on a par with, the current state-of-the-art for instance discrimination self-supervised learning approaches to audio representation learning. Code at

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Event Detection DCASE 2016 Task 2 [ABT] AudioNTT Onset FMS 76.1 # 1
Environment Sound Classification ESC-50 [ABT] AudioNTT Accuracy 78.6 # 1
Environmental Sound Classification FSD50K [ABT] AudioNTT mAP 0.474 # 1