MTEB (Massive Text Embedding Benchmark)

Introduced by Muennighoff et al. in MTEB: Massive Text Embedding Benchmark

MTEB is a benchmark which spans 8 embedding tasks covering a total of 56 datasets and 112 languages. The 8 task types are Bitext mining, Classification, Clustering, Pair Classification, Reranking, Retrieval, Semantic Textual Similarity and Summarisation. The 56 dataset contains varying text lengths and they are grouped into three categories: Sentence to sentence, Paragraph to paragraph and Sentence to paragraph.


