Text-Image Retrieval

16 papers with code • 8 benchmarks • 6 datasets

It include two tasks: (1) Image as Query and Text as Targets; (2) Text as Query and Image as Targets.

Most implemented papers

Deep Visual-Semantic Alignments for Generating Image Descriptions

VinitSR7/Image-Caption-Generation CVPR 2015

Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data.

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

microsoft/Oscar ECCV 2020

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

Dual-Path Convolutional Image-Text Embeddings with Instance Loss

layumi/Image-Text-Embedding 15 Nov 2017

In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space.

WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

google-research-datasets/wit 2 Mar 2021

First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing).

RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network

nashory/rtic-gcn-pytorch 7 Apr 2021

In this paper, we study the compositional learning of images and texts for image retrieval.

SoDeep: a Sorting Deep net to learn ranking loss surrogates

technicolor-research/sodeep CVPR 2019

Our approach is based on a deep architecture that approximates the sorting of arbitrary sets of scores.

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

ZihaoWang-CV/CAMP_iccv19 ICCV 2019

Text-image cross-modal retrieval is a challenging task in the field of language and vision.

CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data

nashory/rtic-gcn-pytorch 27 Mar 2020

In order to learn an effective image-text composition for the data in the fashion domain, our model proposes two key components as follows.

Image Search With Text Feedback by Visiolinguistic Attention Learning

yanbeic/VAL CVPR 2020

In this work, we tackle this task by a novel Visiolinguistic Attention Learning (VAL) framework.

Compositional Learning of Image-Text Query for Image Retrieval

ecom-research/ComposeAE 19 Jun 2020

In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query.