TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Patch Matching	Brown Dataset	Multiscale Transformer Encoder	FPR95	0.9	# 1
Multimodal Patch Matching	VisNir	Multiscale Transformer Encoder	FPR95	1.44	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/paying-attention-to-multiscale-feature-maps/patch-matching-on-brown-dataset)](https://paperswithcode.com/sota/patch-matching-on-brown-dataset?p=paying-attention-to-multiscale-feature-maps)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/paying-attention-to-multiscale-feature-maps/multimodal-patch-matching-on-visnir)](https://paperswithcode.com/sota/multimodal-patch-matching-on-visnir?p=paying-attention-to-multiscale-feature-maps)`

Attention-Based Multimodal Image Matching

20 Mar 2021 · Aviad Moreshet, Yosi Keller ·

We propose an attention-based approach for multimodal image patch matching using a Transformer encoder attending to the feature maps of a multiscale Siamese CNN. Our encoder is shown to efficiently aggregate multiscale image embeddings while emphasizing task-specific appearance-invariant image cues. We also introduce an attention-residual architecture, using a residual connection bypassing the encoder. This additional learning signal facilitates end-to-end training from scratch. Our approach is experimentally shown to achieve new state-of-the-art accuracy on both multimodal and single modality benchmarks, illustrating its general applicability. To the best of our knowledge, this is the first successful implementation of the Transformer encoder architecture to the multimodal image patch matching task.

PDF Abstract

Code

Add Remove Mark official

CodeJjang/multiscale-attention-patc… official

Tasks

Add Remove

Multimodal Patch Matching

Patch Matching

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #1 on Multimodal Patch Matching on VisNir

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Patch Matching	Brown Dataset	Multiscale Transformer Encoder	FPR95	0.9	# 1		Compare
Multimodal Patch Matching	VisNir	Multiscale Transformer Encoder	FPR95	1.44	# 1		Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Attention-Based Multimodal Image Matching

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove