TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (BASE)	Wasserstein Distance (WD)	83.8 ± .6	# 11
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (BASE)	# Correct Groups	89 ± 6	# 12
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (BASE)	Fowlkes Mallows Score (FMS)	33.1 ± .3	# 11
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (BASE)	Adjusted Rand Index (ARI)	16.3 ± .4	# 11
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (BASE)	Adjusted Mutual Information (AMI)	19.5 ± .4	# 11
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (BASE)	# Solved Walls	1 ± 0	# 9
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (LARGE)	Wasserstein Distance (WD)	84.4 ± .7	# 13
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (LARGE)	# Correct Groups	76 ± 5	# 14
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (LARGE)	Fowlkes Mallows Score (FMS)	32.3 ± .4	# 12
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (LARGE)	Adjusted Rand Index (ARI)	15.4 ± .5	# 12
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (LARGE)	Adjusted Mutual Information (AMI)	18.5 ± .6	# 12
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (LARGE)	# Solved Walls	0 ± 0	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-embeddings-by-weakly-supervised/task-1-grouping-on-ocw)](https://paperswithcode.com/sota/task-1-grouping-on-ocw?p=text-embeddings-by-weakly-supervised)`

Text Embeddings by Weakly-Supervised Contrastive Pre-training

7 Dec 2022 · Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei ·

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot settings, E5 is the first model that outperforms the strong BM25 baseline on the BEIR retrieval benchmark without using any labeled data. When fine-tuned, E5 obtains the best results on the MTEB benchmark, beating existing embedding models with 40x more parameters.

PDF Abstract

Code

Add Remove Mark official

microsoft/unilm official

↳ Quickstart in

Spaces

18,315

Tasks

Add Remove

Only Connect Walls Dataset Task 1 (Grouping)

Retrieval

Datasets

SST SST-2

Natural Questions

MS MARCO

HotpotQA

FEVER

BEIR

SentEval SciFact

MTEB

SciDocs CLIMATE-FEVER

OCW

Results from the Paper

Edit

Ranked #11 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (BASE)	Wasserstein Distance (WD)	83.8 ± .6	# 11	Compare
			# Correct Groups	89 ± 6	# 12	Compare
			Fowlkes Mallows Score (FMS)	33.1 ± .3	# 11	Compare
			Adjusted Rand Index (ARI)	16.3 ± .4	# 11	Compare
			Adjusted Mutual Information (AMI)	19.5 ± .4	# 11	Compare
			# Solved Walls	1 ± 0	# 9	Compare
Only Connect Walls Dataset Task 1 (Grouping)	OCW	E5 (LARGE)	Wasserstein Distance (WD)	84.4 ± .7	# 13	Compare
			# Correct Groups	76 ± 5	# 14	Compare
			Fowlkes Mallows Score (FMS)	32.3 ± .4	# 12	Compare
			Adjusted Rand Index (ARI)	15.4 ± .5	# 12	Compare
			Adjusted Mutual Information (AMI)	18.5 ± .6	# 12	Compare
			# Solved Walls	0 ± 0	# 10	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove