Vision and Language Navigation

78 papers with code • 5 benchmarks • 12 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find Vision and Language Navigation models and implementations

Most implemented papers

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

peteanderson80/Matterport3DSimulator CVPR 2018

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

lil-lab/touchdown CVPR 2019

We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task.

Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View

google-research/valan 10 Jan 2020

These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.

How Much Can CLIP Benefit Vision-and-Language Tasks?

clip-vil/CLIP-ViL 13 Jul 2021

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

chihyaoma/regretful-agent CVPR 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

jacobkrantz/VLN-CE ECCV 2020

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

chihyaoma/selfmonitoring-agent ICLR 2019

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

Airbert: In-domain Pretraining for Vision-and-Language Navigation

airbert-vln/airbert ICCV 2021

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

peteanderson80/Matterport3DSimulator ECCV 2018

In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task.