TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	data2vec	Accuracy	60.3%	# 30
Image Classification	ImageNet	data2vec (ViT-H)	Top 1 Accuracy	86.6%	# 132
Image Classification	ImageNet	data2vec (ViT-H)	Number of params	656M	# 946
Speech Recognition	LibriSpeech test-other	data2vec	Word Error Rate (WER)	3.7	# 10
Natural Language Inference	QNLI	data2vec	Accuracy	91.1%	# 30
Paraphrase Identification	Quora Question Pairs	data2vec	Accuracy	92.4	# 1
Natural Language Inference	RTE	data2vec	Accuracy	69.9%	# 54

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data2vec-a-general-framework-for-self-1/paraphrase-identification-on-quora-question)](https://paperswithcode.com/sota/paraphrase-identification-on-quora-question?p=data2vec-a-general-framework-for-self-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data2vec-a-general-framework-for-self-1/speech-recognition-on-librispeech-test-other)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-other?p=data2vec-a-general-framework-for-self-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data2vec-a-general-framework-for-self-1/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=data2vec-a-general-framework-for-self-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data2vec-a-general-framework-for-self-1/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=data2vec-a-general-framework-for-self-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data2vec-a-general-framework-for-self-1/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=data2vec-a-general-framework-for-self-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/data2vec-a-general-framework-for-self-1/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=data2vec-a-general-framework-for-self-1)`

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Preprint 2022 · Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli ·

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.

PDF Abstract

Code

Add Remove Mark official

pytorch/fairseq official

29,242

huggingface/transformers

124,941

AryanShekarlaban/data2vec-pytorch

161

holgerbovbjerg/data2vec-kws

Guillem96/data2vec-vision

See all 9 implementations

Tasks

Add Remove

Image Classification

Linguistic Acceptability

Natural Language Inference

Natural Language Understanding

Paraphrase Identification

Self-Supervised Learning

Speech Recognition

Datasets

ImageNet

GLUE

SST

LibriSpeech

QNLI

MRPC

CoLA

AudioSet Libri-Light

Quora Question Pairs RTE

Results from the Paper

Edit

Ranked #1 on Paraphrase Identification on Quora Question Pairs (Accuracy metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	data2vec	Accuracy	60.3%	# 30	Compare
Image Classification	ImageNet	data2vec (ViT-H)	Top 1 Accuracy	86.6%	# 132	Compare
Image Classification	ImageNet	data2vec (ViT-H)	Number of params	656M	# 946	Compare
Speech Recognition	LibriSpeech test-other	data2vec	Word Error Rate (WER)	3.7	# 10	Compare
Natural Language Inference	QNLI	data2vec	Accuracy	91.1%	# 30	Compare
Paraphrase Identification	Quora Question Pairs	data2vec	Accuracy	92.4	# 1	Compare
Natural Language Inference	RTE	data2vec	Accuracy	69.9%	# 54	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove