Vision-Language Navigation

31 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Most implemented papers

Structured Scene Memory for Vision-Language Navigation

HanqingWangAI/SSM-VLN CVPR 2021

Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i. e., entailing an agent to navigate 3D environments through following linguistic instructions.

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

yuankaiqi/orist ICCV 2021

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

jialuli-luka/SyntaxVLN NAACL 2021

One key challenge in this task is to ground instructions with the current visual information that the agent perceives.

Vision-Language Navigation with Random Environmental Mixup

lcfractal/vlnrem ICCV 2021

Then, we cross-connect the key views of different scenes to construct augmented scenes.

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

expectorlin/DR-Attacker 23 Jul 2021

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

liangcici/CITL-VLN 8 Dec 2021

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

aburns4/MoTIF 4 Feb 2022

To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

liangcici/probes-vln ACL 2022

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

hanqingwangai/ccc-vln CVPR 2022

Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions.

Reinforced Structured State-Evolution for Vision-Language Navigation

chenjinyubuaa/sevol CVPR 2022

However, the crucial navigation clues (i. e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured.