image-sentence alignment

2 papers with code • 12 benchmarks • 1 datasets

Predict the alignment (score) between an image and a sentence.

Benchmarks

Add a Result

These leaderboards are used to track progress in image-sentence alignment

Dataset	Best Model	Compare
VALSE	ViLBERT 12-in-1	See all
VALSE existence	ViLBERT 12-in-1	See all
VALSE plurality	ViLBERT 12-in-1	See all
VALSE counting balanced	ViLBERT 12-in-1	See all
VALSE counting small numbers	ViLBERT 12-in-1	See all
VALSE counting adversarial	ViLBERT 12-in-1	See all
VALSE spatial relations	GPT1	See all
VALSE action replacement	CLIP	See all
VALSE actant swap	GPT2	See all
VALSE coreference standard	ViLBERT 12-in-1	See all
VALSE coreference clean	ViLBERT 12-in-1	See all
VALSE foil-it (noun phrases)	CLIP	See all

Show all 12 benchmarks

Collapse benchmarks

Datasets

VALSE

Most implemented papers

Most implemented Social Latest No code

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

ukyh/RemovingSpuriousAlignment • • EACL 2021

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images.

Paper
Code

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

heidelberg-nlp/valse • • ACL 2022

We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena.