TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Cross-Modal Retrieval	COCO 2014	ImageBERT	Image-to-text R@1	44.0	# 15
Zero-Shot Cross-Modal Retrieval	COCO 2014	ImageBERT	Image-to-text R@5	71.2	# 15
Zero-Shot Cross-Modal Retrieval	COCO 2014	ImageBERT	Image-to-text R@10	80.4	# 14
Zero-Shot Cross-Modal Retrieval	COCO 2014	ImageBERT	Text-to-image R@1	32.3	# 15
Zero-Shot Cross-Modal Retrieval	COCO 2014	ImageBERT	Text-to-image R@5	59.0	# 15
Zero-Shot Cross-Modal Retrieval	COCO 2014	ImageBERT	Text-to-image R@10	70.2	# 14
Zero-Shot Cross-Modal Retrieval	Flickr30k	ImageBERT	Image-to-text R@1	70.7	# 18
Zero-Shot Cross-Modal Retrieval	Flickr30k	ImageBERT	Image-to-text R@5	90.2	# 19
Zero-Shot Cross-Modal Retrieval	Flickr30k	ImageBERT	Image-to-text R@10	94.0	# 17
Zero-Shot Cross-Modal Retrieval	Flickr30k	ImageBERT	Text-to-image R@1	54.3	# 19
Zero-Shot Cross-Modal Retrieval	Flickr30k	ImageBERT	Text-to-image R@5	79.6	# 19
Zero-Shot Cross-Modal Retrieval	Flickr30k	ImageBERT	Text-to-image R@10	87.5	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/imagebert-cross-modal-pre-training-with-large/zero-shot-cross-modal-retrieval-on-coco-2014)](https://paperswithcode.com/sota/zero-shot-cross-modal-retrieval-on-coco-2014?p=imagebert-cross-modal-pre-training-with-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/imagebert-cross-modal-pre-training-with-large/zero-shot-cross-modal-retrieval-on-flickr30k)](https://paperswithcode.com/sota/zero-shot-cross-modal-retrieval-on-flickr30k?p=imagebert-cross-modal-pre-training-with-large)`

ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

22 Jan 2020 · Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti ·

In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them. The model is pre-trained on four tasks simultaneously: Masked Language Modeling (MLM), Masked Object Classification (MOC), Masked Region Feature Regression (MRFR), and Image Text Matching (ITM). To further enhance the pre-training quality, we have collected a Large-scale weAk-supervised Image-Text (LAIT) dataset from Web. We first pre-train the model on this dataset, then conduct a second stage pre-training on Conceptual Captions and SBU Captions. Our experiments show that multi-stage pre-training strategy outperforms single-stage pre-training. We also fine-tune and evaluate our pre-trained ImageBERT model on image retrieval and text retrieval tasks, and achieve new state-of-the-art results on both MSCOCO and Flickr30k datasets.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Retrieval

Image-text matching

Language Modelling

Masked Language Modeling

Retrieval

Text Matching

Text Retrieval

Zero-Shot Cross-Modal Retrieval

Datasets

MS COCO

Visual Genome

Flickr30k

Conceptual Captions

VCR

Results from the Paper

Edit

Ranked #15 on Zero-Shot Cross-Modal Retrieval on COCO 2014

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Cross-Modal Retrieval	COCO 2014	ImageBERT	Image-to-text R@1	44.0	# 15	Compare
			Image-to-text R@5	71.2	# 15	Compare
			Image-to-text R@10	80.4	# 14	Compare
			Text-to-image R@1	32.3	# 15	Compare
			Text-to-image R@5	59.0	# 15	Compare
			Text-to-image R@10	70.2	# 14	Compare
Zero-Shot Cross-Modal Retrieval	Flickr30k	ImageBERT	Image-to-text R@1	70.7	# 18	Compare
			Image-to-text R@5	90.2	# 19	Compare
			Image-to-text R@10	94.0	# 17	Compare
			Text-to-image R@1	54.3	# 19	Compare
			Text-to-image R@5	79.6	# 19	Compare
			Text-to-image R@10	87.5	# 17	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove