Self-Supervised Learning is proposed for utilizing unlabeled data with the success of supervised learning. Producing a dataset with good labels is expensive, while unlabeled data is being generated all the time. The motivation of Self-Supervised Learning is to make use of the large amount of unlabeled data. The main idea of Self-Supervised Learning is to generate the labels from unlabeled data, according to the structure or characteristics of the data itself, and then train on this unsupervised data in a supervised manner. Self-Supervised Learning is wildly used in representation learning to make a model learn the latent features of the data. This technique is often employed in computer vision, video processing and robot control.
Image source: LeCun
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks.
Ranked #1 on Action Classification on Moments in Time (using extra training data)
While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.
Ranked #3 on Video Alignment on UPenn Action
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.
Ranked #1 on Natural Language Inference on QNLI
We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT.
Ranked #1 on Text Summarization on OrangeSum (using extra training data)
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
Ranked #1 on Speech Recognition on TIMIT (using extra training data)
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.
Ranked #182 on Image Classification on ImageNet
Contrastive self-supervised learning (CSL) is an approach to learn useful representations by solving a pretext task that selects and compares anchor, negative and positive (APN) features from an unlabeled dataset.
Ranked #20 on Image Classification on STL-10
We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.
Ranked #2 on Speech Recognition on TIMIT (using extra training data)