MTEB Benchmark
7 papers with code • 1 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in MTEB Benchmark
Trend | Dataset | Best Model | Paper | Code | Compare |
---|
Most implemented papers
C-Pack: Packed Resources For General Chinese Embeddings
Along with our resources on general Chinese embedding, we release our data and models for English text embeddings.
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information.
Text Embeddings by Weakly-Supervised Contrastive Pre-training
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.
A Bi-metric Framework for Fast Similarity Search
In both cases we show that, as long as the proxy metric used to construct the data structure approximates the ground-truth metric up to a bounded factor, our data structure achieves arbitrarily good approximation guarantees with respect to the ground-truth metric.
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text, bypassing the costly and complex procedure of contrastive learning.
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model
As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial.
NeoBERT: A Next-Generation BERT
Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of large auto-regressive language models such as LLaMA and DeepSeek.