TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Beam Search)	success	0.71	# 13
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Beam Search)	length	40.85	# 17
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Beam Search)	error	3.24	# 134
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Beam Search)	oracle success	0.81	# 17
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Beam Search)	spl	0.21	# 121
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)	success	0.68	# 27
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)	length	10.43	# 118
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)	error	3.69	# 115
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)	oracle success	0.75	# 32
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)	spl	0.65	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vision-language-navigation-with-self/vision-and-language-navigation-on-vln)](https://paperswithcode.com/sota/vision-and-language-navigation-on-vln?p=vision-language-navigation-with-self)`

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

CVPR 2020 · Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang ·

Vision-Language Navigation (VLN) is a task where agents learn to navigate following natural language instructions. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches exploit the vision and language features in cross-modal grounding. However, the VLN task remains challenging, since previous works have neglected the rich semantic information contained in the environment (such as implicit navigation graphs or sub-trajectory semantics). In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information. The auxiliary tasks have four reasoning objectives: explaining the previous actions, estimating the navigation progress, predicting the next orientation, and evaluating the trajectory consistency. As a result, these additional training signals help the agent to acquire knowledge of semantic representations in order to reason about its activity and build a thorough perception of the environment. Our experiments indicate that auxiliary reasoning tasks improve both the performance of the main task and the model generalizability by a large margin. Empirically, we demonstrate that an agent trained with self-supervised auxiliary reasoning tasks substantially outperforms the previous state-of-the-art method, being the best existing approach on the standard benchmark.

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Navigate

Vision-Language Navigation

Datasets

Visual Question Answering

Results from the Paper

Edit

Ranked #13 on Vision and Language Navigation on VLN Challenge

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Beam Search)	success	0.71	# 13	Compare
			length	40.85	# 17	Compare
			error	3.24	# 134	Compare
			oracle success	0.81	# 17	Compare
			spl	0.21	# 121	Compare
Vision and Language Navigation	VLN Challenge	Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)	success	0.68	# 27	Compare
			length	10.43	# 118	Compare
			error	3.69	# 115	Compare
			oracle success	0.75	# 32	Compare
			spl	0.65	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove