Text Retrieval

237 papers with code • 5 benchmarks • 14 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Retrieval

Dataset	Best Model	Compare
MTEB	SGPT-5.8B-msmarco	See all
Image-Chat	PaCE	See all
20 Newsgroups	B-VAE	See all
Reuters-21578	VDSH	See all
RSICD	GeoRSCLIP-FT	See all

Libraries

Use these libraries to find Text Retrieval models and implementations

towhee-io/towhee

5 papers

2,987

huggingface/transformers

2 papers

124,984

salesforce/lavis

2 papers

8,722

modelscope/modelscope

2 papers

6,049

See all 6 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

CryhanFang/CLIP2Video • • CVPR 2020

To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels.

Paper
Code

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

kakaobrain/coyo-dataset • • 11 Feb 2021

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

Paper
Code

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

sebastian-hofstaetter/tas-balanced-dense-retrieval • 14 Apr 2021

A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.

Paper
Code

FlexiViT: One Model for All Patch Sizes

google-research/big_vision • • CVPR 2023

Vision Transformers convert images to sequences by slicing them into patches.

Paper
Code

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

pku-yuangroup/languagebind • • 3 Oct 2023

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

Paper
Code

Single Shot Scene Text Retrieval

lluisgomez/single-shot-str • • ECCV 2018

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.

Paper
Code

Image Chat: Engaging Grounded Conversations

facebookresearch/ParlAI • • 2 Nov 2018

To test such models, we collect a dataset of grounded human-human conversations, where speakers are asked to play roles given a provided emotional mood or style, as the use of such traits is also a key factor in engagingness (Guo et al., 2019).

Paper
Code

WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

google-research-datasets/wit • 2 Mar 2021

First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing).

Paper
Code

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

researchmm/soho • • CVPR 2021

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

Paper
Code

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

alibaba/AliceMind • • 24 May 2022

Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks.

Paper
Code

Text Retrieval

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result