Vision and Language Navigation

104 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find Vision and Language Navigation models and implementations

Most implemented papers

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

peteanderson80/Matterport3DSimulator CVPR 2018

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

lil-lab/touchdown CVPR 2019

We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task.

Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View

google-research/valan 10 Jan 2020

These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

jacobkrantz/VLN-CE ECCV 2020

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

How Much Can CLIP Benefit Vision-and-Language Tasks?

clip-vil/CLIP-ViL 13 Jul 2021

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

chihyaoma/regretful-agent CVPR 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

chihyaoma/selfmonitoring-agent ICLR 2019

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

Airbert: In-domain Pretraining for Vision-and-Language Navigation

airbert-vln/airbert ICCV 2021

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

gengzezhou/navgpt 26 May 2023

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling.