TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Cross-Modal Retrieval	COCO 2014	LILE	Image-to-text R@1	55.6	# 25
Cross-Modal Retrieval	COCO 2014	LILE	Image-to-text R@10	91.0	# 23
Cross-Modal Retrieval	COCO 2014	LILE	Image-to-text R@5	82.4	# 25
Cross-Modal Retrieval	COCO 2014	LILE	Text-to-image R@1	41.5	# 28
Cross-Modal Retrieval	COCO 2014	LILE	Text-to-image R@10	82.2	# 24
Cross-Modal Retrieval	COCO 2014	LILE	Text-to-image R@5	72.1	# 25

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lile-look-in-depth-before-looking-elsewhere-a/cross-modal-retrieval-on-coco-2014)](https://paperswithcode.com/sota/cross-modal-retrieval-on-coco-2014?p=lile-look-in-depth-before-looking-elsewhere-a)`

LILE: Look In-Depth before Looking Elsewhere -- A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives

2 Mar 2022 · Danial Maleki, H. R Tizhoosh ·

The volume of available data has grown dramatically in recent years in many applications. Furthermore, the age of networks that used multiple modalities separately has practically ended. Therefore, enabling bidirectional cross-modality data retrieval capable of processing has become a requirement for many domains and disciplines of research. This is especially true in the medical field, as data comes in a multitude of types, including various types of images and reports as well as molecular data. Most contemporary works apply cross attention to highlight the essential elements of an image or text in relation to the other modalities and try to match them together. However, regardless of their importance in their own modality, these approaches usually consider features of each modality equally. In this study, self-attention as an additional loss term will be proposed to enrich the internal representation provided into the cross attention module. This work suggests a novel architecture with a new loss term to help represent images and texts in the joint latent space. Experiment results on two benchmark datasets, i.e. MS-COCO and ARCH, show the effectiveness of the proposed method.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Cross-Modal Information Retrieval

Cross-Modal Retrieval

Information Retrieval

Retrieval

Datasets

MS COCO

ARCH

Results from the Paper

Edit

Ranked #28 on Cross-Modal Retrieval on COCO 2014

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Cross-Modal Retrieval	COCO 2014	LILE	Image-to-text R@1	55.6	# 25	Compare
			Image-to-text R@10	91.0	# 23	Compare
			Image-to-text R@5	82.4	# 25	Compare
			Text-to-image R@1	41.5	# 28	Compare
			Text-to-image R@10	82.2	# 24	Compare
			Text-to-image R@5	72.1	# 25	Compare

Methods

Add Remove

ARCH

Edit Social Preview

LILE: Look In-Depth before Looking Elsewhere -- A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove