Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously unseen environment, the agent was tasked with navigating from source to target location as quickly as possible. While all current approaches make local action decisions or score entire trajectories using beam search, ours balances local and global signals when exploring an unobserved environment. Importantly, this lets us act greedily but use global signals to backtrack when necessary. Applying FAST framework to existing state-of-the-art models achieved a 17% relative gain, an absolute 6% gain on Success rate weighted by Path Length (SPL).

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Vision-Language Navigation Room2Room Tactical Rewind - short spl 0.41 # 3
Vision and Language Navigation VLN Challenge Tactical Rewind - short success 0.54 # 76
length 22.08 # 17
error 5.14 # 48
oracle success 0.64 # 68
spl 0.41 # 79
Vision and Language Navigation VLN Challenge Tactical Rewind - long success 0.61 # 47
length 196.53 # 13
error 4.29 # 72
oracle success 0.9 # 14
spl 0.03 # 110

Methods


No methods listed for this paper. Add relevant methods here