Vision and Language Navigation

88 papers with code • 5 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Vision and Language Navigation

Dataset	Best Model	Compare
VLN Challenge	human	See all
Touchdown Dataset	ORAR + junction type + heading delta	See all
RxR	MARVAL	See all
map2seq	ORAR + junction type + heading delta	See all
robo-vln	Hierarchical Cross-Modal Agent	See all

Libraries

Use these libraries to find Vision and Language Navigation models and implementations

google-research/valan

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

peteanderson80/Matterport3DSimulator • • CVPR 2018

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Paper
Code

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

lil-lab/touchdown • • CVPR 2019

We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task.

Paper
Code

Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View

google-research/valan • • 10 Jan 2020

These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.

Paper
Code

How Much Can CLIP Benefit Vision-and-Language Tasks?

clip-vil/CLIP-ViL • • 13 Jul 2021

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.

Paper
Code

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

chihyaoma/regretful-agent • • CVPR 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Paper
Code

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

jacobkrantz/VLN-CE • • ECCV 2020

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Paper
Code

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

google-research-datasets/RxR • • EMNLP 2020

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.

Paper
Code

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

chihyaoma/selfmonitoring-agent • • ICLR 2019

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

Paper
Code

Airbert: In-domain Pretraining for Vision-and-Language Navigation

airbert-vln/airbert • • ICCV 2021

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

Paper
Code

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

peteanderson80/Matterport3DSimulator • • ECCV 2018

In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task.

Paper
Code

Vision and Language Navigation

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result