MTEB Benchmark

7 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Datasets


Most implemented papers

C-Pack: Packed Resources For General Chinese Embeddings

flagopen/flagembedding 14 Sep 2023

Along with our resources on general Chinese embedding, we release our data and models for English text embeddings.

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

jina-ai/late-chunking 30 Oct 2023

Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information.

Text Embeddings by Weakly-Supervised Contrastive Pre-training

microsoft/unilm 7 Dec 2022

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.

A Bi-metric Framework for Fast Similarity Search

xuhaike/Bi-metric-search 5 Jun 2024

In both cases we show that, as long as the proxy metric used to construct the data structure approximates the ground-truth metric up to a bounded factor, our data structure achieves arbitrarily good approximation guarantees with respect to the ground-truth metric.

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings

raghavlite/GenEOL 18 Oct 2024

Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text, bypassing the costly and complex procedure of contrastive learning.

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

HITsz-TMG/KaLM-Embedding 2 Jan 2025

As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial.

NeoBERT: A Next-Generation BERT

chandar-lab/NeoBERT 26 Feb 2025

Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of large auto-regressive language models such as LLaMA and DeepSeek.