131 papers with code • 1 benchmarks • 5 datasets
These leaderboards are used to track progress in Text Retrieval
LibrariesUse these libraries to find Text Retrieval models and implementations
Most implemented papers
Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering
We introduce an approach for open-domain question answering (QA) that retrieves and reads a passage graph, where vertices are passages of text and edges represent relationships that are derived from an external knowledge base or co-occurrence in the same article.
Stacked Cross Attention for Image-Text Matching
Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable.
UNITER: UNiversal Image-TExt Representation Learning
Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).
Language-agnostic BERT Sentence Embedding
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets.
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels.
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing.
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP model to video-language retrieval in an end-to-end manner.