MTEB (Massive Text Embedding Benchmark)

Introduced by Muennighoff et al. in MTEB: Massive Text Embedding Benchmark

MTEB is a benchmark that spans 8 embedding tasks covering a total of 56 datasets and 112 languages. The 8 task types are Bitext mining, Classification, Clustering, Pair Classification, Reranking, Retrieval, Semantic Textual Similarity and Summarisation. The 56 datasets contain varying text lengths and they are grouped into three categories: Sentence to sentence, Paragraph to paragraph, and Sentence to paragraph.

Check the latest leaderboards at HuggingFace.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets