TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Question Answering (VQA)	VQA v2 test-dev	BLOCK	Accuracy	67.58	# 38
Visual Question Answering (VQA)	VQA v2 test-std	BLOCK	overall	67.9	# 34
Visual Relationship Detection	VRD Phrase Detection	BLOCK	R@100	28.96	# 2
Visual Relationship Detection	VRD Phrase Detection	BLOCK	R@50	26.32	# 1
Visual Relationship Detection	VRD Predicate Detection	BLOCK	R@100	92.58	# 3
Visual Relationship Detection	VRD Predicate Detection	BLOCK	R@50	86.58	# 1
Visual Relationship Detection	VRD Relationship Detection	BLOCK	R@100	20.96	# 3
Visual Relationship Detection	VRD Relationship Detection	BLOCK	R@50	19.06	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/block-bilinear-superdiagonal-fusion-for/visual-relationship-detection-on-vrd-phrase)](https://paperswithcode.com/sota/visual-relationship-detection-on-vrd-phrase?p=block-bilinear-superdiagonal-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/block-bilinear-superdiagonal-fusion-for/visual-relationship-detection-on-vrd)](https://paperswithcode.com/sota/visual-relationship-detection-on-vrd?p=block-bilinear-superdiagonal-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/block-bilinear-superdiagonal-fusion-for/visual-relationship-detection-on-vrd-1)](https://paperswithcode.com/sota/visual-relationship-detection-on-vrd-1?p=block-bilinear-superdiagonal-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/block-bilinear-superdiagonal-fusion-for/visual-question-answering-on-vqa-v2-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-std?p=block-bilinear-superdiagonal-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/block-bilinear-superdiagonal-fusion-for/visual-question-answering-on-vqa-v2-test-dev)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-dev?p=block-bilinear-superdiagonal-fusion-for)`

BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

31 Jan 2019 · Hedi Ben-Younes, Rémi Cadene, Nicolas Thome, Matthieu Cord ·

Multimodal representation learning is gaining more and more interest within the deep learning community. While bilinear models provide an interesting framework to find subtle combination of modalities, their number of parameters grows quadratically with the input dimensions, making their practical implementation within classical deep learning pipelines challenging. In this paper, we introduce BLOCK, a new multimodal fusion based on the block-superdiagonal tensor decomposition. It leverages the notion of block-term ranks, which generalizes both concepts of rank and mode ranks for tensors, already used for multimodal fusion. It allows to define new ways for optimizing the tradeoff between the expressiveness and complexity of the fusion model, and is able to represent very fine interactions between modalities while maintaining powerful mono-modal representations. We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities. Through extensive experiments, we show that BLOCK compares favorably with respect to state-of-the-art multimodal fusion models for both VQA and VRD tasks. Our code is available at https://github.com/Cadene/block.bootstrap.pytorch.

PDF Abstract

Code

Add Remove Mark official

Cadene/block.bootstrap.pytorch official

333

Tasks

Add Remove

Question Answering

Relationship Detection

Representation Learning

Tensor Decomposition

Visual Question Answering

Visual Question Answering (VQA)

Visual Relationship Detection

Datasets

Visual Question Answering

Visual Question Answering v2.0

VRD

TDIUC

Results from the Paper

Edit

Ranked #2 on Visual Relationship Detection on VRD Phrase Detection

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Question Answering (VQA)	VQA v2 test-dev	BLOCK	Accuracy	67.58	# 38	Compare
Visual Question Answering (VQA)	VQA v2 test-std	BLOCK	overall	67.9	# 34	Compare
Visual Relationship Detection	VRD Phrase Detection	BLOCK	R@100	28.96	# 2	Compare
Visual Relationship Detection	VRD Phrase Detection	BLOCK	R@50	26.32	# 1	Compare
Visual Relationship Detection	VRD Predicate Detection	BLOCK	R@100	92.58	# 3	Compare
Visual Relationship Detection	VRD Predicate Detection	BLOCK	R@50	86.58	# 1	Compare
Visual Relationship Detection	VRD Relationship Detection	BLOCK	R@100	20.96	# 3	Compare
Visual Relationship Detection	VRD Relationship Detection	BLOCK	R@50	19.06	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove