Vision-Language Navigation

28 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Most implemented papers

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

chihyaoma/regretful-agent CVPR 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

chihyaoma/selfmonitoring-agent ICLR 2019

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

Cross-Lingual Vision-Language Navigation

zzxslp/Crosslingual-VLN 24 Oct 2019

Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics.

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

peteanderson80/Matterport3DSimulator ECCV 2018

In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task.

The Regretful Navigation Agent for Vision-and-Language Navigation

chihyaoma/regretful-agent CVPR 2019 (Oral) 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Kelym/FAST CVPR 2019

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

airsplay/R2R-EnvDrop NAACL 2019

Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions.

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

google-research/valan ECCV 2020

Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.

Active Visual Information Gathering for Vision-Language Navigation

HanqingWangAI/Active_VLN ECCV 2020

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.

A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

Homagn/MOVILAN 19 Jan 2021

In this paper we propose a new framework - MoViLan (Modular Vision and Language) for execution of visually grounded natural language instructions for day to day indoor household tasks.