Vision-Language Navigation

31 papers with code • 1 benchmarks • 7 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.

( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

Benchmarks

Add a Result

These leaderboards are used to track progress in Vision-Language Navigation

Trend	Dataset	Best Model	Paper	Code	Compare
	Room2Room	R2R+EnvDrop			See all

Datasets

Latest papers

Most implemented Social Latest No code

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

mrzihan/hnr-vln • • 2 Apr 2024

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments.

02 Apr 2024

Paper
Code

Volumetric Environment Representation for Vision-Language Navigation

defaultrui/vln-ver • • 21 Mar 2024

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

21 Mar 2024

Paper
Code

Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty

joeyy5588/planning-as-inpainting • • 2 Dec 2023

In this paper, we aim to tackle this problem with a unified framework consisting of an end-to-end trainable method and a planning algorithm.

02 Dec 2023

Paper
Code

An Embodied Generalist Agent in 3D World

embodied-generalist/embodied-generalist • • 18 Nov 2023

Leveraging massive knowledge and learning schemes from large language models (LLMs), recent machine learning models show notable successes in building generalist agents that exhibit the capability of general-purpose task solving in diverse domains, including natural language processing, computer vision, and robotics.

197

18 Nov 2023

Paper
Code

Bird's-Eye-View Scene Graph for Vision-Language Navigation

defaultrui/bev-scene-graph • • ICCV 2023

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.

09 Aug 2023

Paper
Code

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

marsaki/etpnav • • 6 Apr 2023

To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.

161

06 Apr 2023

Paper
Code

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

chengaopro/azhp • • CVPR 2023

In this paper, we propose an Adaptive Zone-aware Hierarchical Planner (AZHP) to explicitly divides the navigation process into two heterogeneous phases, i. e., sub-goal setting via zone partition/selection (high-level action) and sub-goal executing (low-level action), for hierarchical planning.

01 Jan 2023

Paper
Code

Towards Versatile Embodied Navigation

hanqingwangai/vxn • • 30 Oct 2022

With the emergence of varied visual navigation tasks (e. g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well.

30 Oct 2022

Paper
Code

DANLI: Deliberative Agent for Following Natural Language Instructions

sled-group/danli • • 22 Oct 2022

These reactive agents are insufficient for long-horizon complex tasks.

22 Oct 2022

Paper
Code

Target-Driven Structured Transformer Planner for Vision-Language Navigation

yushengzhao/td-stp • • 19 Jul 2022

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.

19 Jul 2022

Paper
Code

Vision-Language Navigation

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result