TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Vision and Language Navigation	Touchdown Dataset	VLN Transformer +M-50 +style	Task Completion (TC)	16.2	# 4
Vision and Language Navigation	Touchdown Dataset	VLN Transformer	Task Completion (TC)	14.9	# 5
Vision and Language Navigation	Touchdown Dataset	Gated Attention (GA)	Task Completion (TC)	11.9	# 8
Vision and Language Navigation	Touchdown Dataset	RConcat	Task Completion (TC)	11.8	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-text-style-transfer-for-outdoor/vision-and-language-navigation-on-touchdown)](https://paperswithcode.com/sota/vision-and-language-navigation-on-touchdown?p=multimodal-text-style-transfer-for-outdoor)`

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

EACL 2021 · Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang ·

One of the most challenging topics in Natural Language Processing (NLP) is visually-grounded language understanding and reasoning. Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment. Due to the lack of human-annotated instructions that illustrate intricate urban scenes, outdoor VLN remains a challenging task to solve. This paper introduces a Multimodal Text Style Transfer (MTST) learning approach and leverages external multimodal resources to mitigate data scarcity in outdoor navigation tasks. We first enrich the navigation data by transferring the style of the instructions generated by Google Maps API, then pre-train the navigator with the augmented external outdoor navigation dataset. Experimental results show that our MTST learning approach is model-agnostic, and our MTST approach significantly outperforms the baseline models on the outdoor VLN task, improving task completion rate by 8.7% relatively on the test set.

PDF Abstract EACL 2021 PDF EACL 2021 Abstract

Code

Add Remove Mark official

VegB/VLN-Transformer

Tasks

Add Remove

Style Transfer

Text Style Transfer

Vision and Language Navigation

Datasets

Touchdown Dataset

Results from the Paper

Edit

Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Vision and Language Navigation	Touchdown Dataset	VLN Transformer +M-50 +style	Task Completion (TC)	16.2	# 4	Compare
Vision and Language Navigation	Touchdown Dataset	VLN Transformer	Task Completion (TC)	14.9	# 5	Compare
Vision and Language Navigation	Touchdown Dataset	Gated Attention (GA)	Task Completion (TC)	11.9	# 8	Compare
Vision and Language Navigation	Touchdown Dataset	RConcat	Task Completion (TC)	11.8	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove