Search Results

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

12 code implementations CVPR 2021 2021

One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances.

Scene Text Detection Text Detection

Multilingual E5 Text Embeddings: A Technical Report

1 code implementation8 Feb 2024

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.

Text Embeddings by Weakly-Supervised Contrastive Pre-training

1 code implementation7 Dec 2022

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.

Ranked #11 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

MTEB Benchmark Only Connect Walls Dataset Task 1 (Grouping) +1

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

4 code implementations5 Feb 2024

It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval, which provides a unified model foundation for real-world IR applications.

Retrieval Self-Knowledge Distillation

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval

1 code implementation6 Jun 2024

Thirdly, we introduce a multi-stage training algorithm, which first aligns the visual token embedding with the text encoder using massive weakly labeled data, and then develops multi-modal representation capability using the generated composed image-text data.

Image Retrieval Retrieval

MTEB: Massive Text Embedding Benchmark

5 code implementations13 Oct 2022

MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.

Benchmarking Information Retrieval +9

The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

2 code implementations4 Jun 2024

The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB.

MMTEB: Massive Multilingual Text Embedding Benchmark

1 code implementation19 Feb 2025

MMTEB includes a diverse set of challenging, novel tasks such as instruction following, long-document retrieval, and code retrieval, representing the largest multilingual collection of evaluation tasks for embedding models to date.

Instruction Following Retrieval

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

4 code implementations19 Dec 2022

Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.

Information Retrieval Learning Word Embeddings +3

Enhanced Network Embedding with Text Information

2 code implementations 24th International Conference on Pattern Recognition (ICPR) 2018

TENE learns the representations of nodes under the guidance of both proximity matrix which captures the network structure and text cluster membership matrix derived from clustering for text information.

Clustering Multi-class Classification +2