Vision and Language Navigation
88 papers with code • 5 benchmarks • 13 datasets
Libraries
Use these libraries to find Vision and Language Navigation models and implementationsMost implemented papers
Speaker-Follower Models for Vision-and-Language Navigation
We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.
The Regretful Navigation Agent for Vision-and-Language Navigation
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.
Chasing Ghosts: Instruction Following as Bayesian State Tracking
Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction.
Robust Navigation with Language Pretraining and Stochastic Sampling
Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments.
Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination.
VALAN: Vision and Language Agent Navigation
VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture.
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions.