MTEB (Massive Text Embedding Benchmark)

Introduced by Muennighoff et al. in MTEB: Massive Text Embedding Benchmark

MTEB is a benchmark which spans 8 embedding tasks covering a total of 56 datasets and 112 languages. The 8 task types are Bitext mining, Classification, Clustering, Pair Classification, Reranking, Retrieval, Semantic Textual Similarity and Summarisation. The 56 dataset contains varying text lengths and they are grouped into three categories: Sentence to sentence, Paragraph to paragraph and Sentence to paragraph.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


Modalities


Languages