Vision and Language Navigation

88 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Vision and Language Navigation models and implementations

Most implemented papers

Speaker-Follower Models for Vision-and-Language Navigation

ronghanghu/speaker_follower NeurIPS 2018

We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.

The Regretful Navigation Agent for Vision-and-Language Navigation

chihyaoma/regretful-agent CVPR 2019 (Oral) 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Kelym/FAST CVPR 2019

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

YuankaiQi/REVERIE CVPR 2020

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Chasing Ghosts: Instruction Following as Bayesian State Tracking

batra-mlp-lab/vln-chasing-ghosts NeurIPS 2019

Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

aimagelab/DynamicConv-agent 5 Jul 2019

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction.

Robust Navigation with Language Pretraining and Stochastic Sampling

xjli/r2r_vln IJCNLP 2019

Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments.

Multimodal Attention Networks for Low-Level Vision-and-Language Navigation

aimagelab/perceive-transform-and-act 27 Nov 2019

Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination.

VALAN: Vision and Language Agent Navigation

google-research/valan 6 Dec 2019

VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture.

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

weituo12321/PREVALENT CVPR 2020

By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions.