Semantic Image-Text Similarity

3 papers with code • 1 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Learning the Best Pooling Strategy for Visual Semantic Embedding

woodfrog/vse_infty CVPR 2021

Visual Semantic Embedding (VSE) is a dominant approach for vision-language retrieval, which aims at learning a deep embedding space such that visual data are embedded close to their semantic text labels or descriptions.

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

crossmodalgroup/laps CVPR 2024

We propose a novel Linguistic-Aware Patch Slimming (LAPS) framework for fine-grained alignment which explicitly identifies redundant visual patches with language supervision and rectifies their semantic and spatial information to facilitate more effective and consistent patch-word alignment.

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

tmlr-group/wca 5 Jun 2024

The local visual areas are then cross-aligned with the finer descriptions by creating a similarity matrix using the pre-trained VLM.