486 papers with code • 1 benchmarks • 21 datasets
Self-Supervised Learning is proposed for utilizing unlabeled data with the success of supervised learning. Producing a dataset with good labels is expensive, while unlabeled data is being generated all the time. The motivation of Self-Supervised Learning is to make use of the large amount of unlabeled data. The main idea of Self-Supervised Learning is to generate the labels from unlabeled data, according to the structure or characteristics of the data itself, and then train on this unsupervised data in a supervised manner. Self-Supervised Learning is wildly used in representation learning to make a model learn the latent features of the data. This technique is often employed in computer vision, video processing and robot control.
Image source: LeCun
We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.
Ranked #1 on Action Classification on Moments in Time (using extra training data)
Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.
Ranked #1 on Self-Supervised Action Recognition on Kinetics-600
This paper presents SimCLR: a simple framework for contrastive learning of visual representations.
Ranked #4 on Self-Supervised Person Re-Identification on SYSU-30k
While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.
Ranked #3 on Video Alignment on UPenn Action
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.
Ranked #1 on Natural Language Inference on QNLI
We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT.
Ranked #1 on Text Summarization on OrangeSum (using extra training data)
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
Ranked #1 on Speech Recognition on TIMIT (using extra training data)
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.
Ranked #247 on Image Classification on ImageNet