Self-Supervised Audio Classification
7 papers with code • 2 benchmarks • 1 datasets
Most implemented papers
ATST: Audio Representation Learning with Teacher-Student Transformer
Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.
Putting An End to End-to-End: Gradient-Isolated Learning of Representations
We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead.
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa.
Self-Supervised MultiModal Versatile Networks
In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.
Broaden Your Views for Self-Supervised Video Learning
Most successful self-supervised learning methods are trained to align the representations of two independent views from the data.
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
We present CrissCross, a self-supervised framework for learning audio-visual representations.