MTEB (Massive Text Embedding Benchmark)

Introduced by Muennighoff et al. in MTEB: Massive Text Embedding Benchmark

MTEB is a benchmark that spans 8 embedding tasks covering a total of 56 datasets and 112 languages. The 8 task types are Bitext mining, Classification, Clustering, Pair Classification, Reranking, Retrieval, Semantic Textual Similarity and Summarisation. The 56 datasets contain varying text lengths and they are grouped into three categories: Sentence to sentence, Paragraph to paragraph, and Sentence to paragraph.

Check the latest leaderboards at HuggingFace.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Text Classification	MTEB	ST5-XXL
Text Clustering	MTEB	ST5-XXL
Text Retrieval	MTEB	SGPT-5.8B-msmarco
Semantic Textual Similarity	MTEB	ST5-XXL
Text Pair Classification	MTEB	GTR-XL
Text Reranking	MTEB	MPNet
Text Summarization	MTEB	MPNet-multilingual
Information Retrieval	MTEB	SGPT-5.8B-msmarco