Text Retrieval

237 papers with code • 5 benchmarks • 14 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Text Retrieval models and implementations
5 papers
2,987
2 papers
8,722
See all 6 libraries.

Most implemented papers

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

CryhanFang/CLIP2Video CVPR 2020

To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels.

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

kakaobrain/coyo-dataset 11 Feb 2021

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

sebastian-hofstaetter/tas-balanced-dense-retrieval 14 Apr 2021

A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.

FlexiViT: One Model for All Patch Sizes

google-research/big_vision CVPR 2023

Vision Transformers convert images to sequences by slicing them into patches.

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

pku-yuangroup/languagebind 3 Oct 2023

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

Single Shot Scene Text Retrieval

lluisgomez/single-shot-str ECCV 2018

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.

Image Chat: Engaging Grounded Conversations

facebookresearch/ParlAI 2 Nov 2018

To test such models, we collect a dataset of grounded human-human conversations, where speakers are asked to play roles given a provided emotional mood or style, as the use of such traits is also a key factor in engagingness (Guo et al., 2019).

WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

google-research-datasets/wit 2 Mar 2021

First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing).

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

researchmm/soho CVPR 2021

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

alibaba/AliceMind 24 May 2022

Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks.