Visual Navigation

105 papers with code • 6 benchmarks • 16 datasets

Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.

Source: Vision-based Navigation Using Deep Reinforcement Learning

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Navigation

Dataset	Best Model	Compare
R2R	Meta-Explore	See all
Cooperative Vision-and-Dialogue Navigation	NaviLLM	See all
SOON Test	AutoVLN	See all
AI2-THOR	MVV-IN	See all
Dmlab-30	PopArt-IMPALA	See all
Help, Anna! (HANNA)	Prevalent	See all

Libraries

Use these libraries to find Visual Navigation models and implementations

mchancan/citylearn

2 papers

Datasets

Latest papers with no code

Most implemented Social Latest No code

TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability

no code yet • 12 Apr 2024

Learning domain-independent visual representation is critical for enabling the trained DRL agent with the ability to generalize to unseen scenes and objects.

Paper
Add Code

Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method

no code yet • 11 Apr 2024

In this paper we have present an improved Cycle GAN based model for under water image enhancement.

Paper
Add Code

Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision

no code yet • 10 Apr 2024

Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes.

Paper
Add Code

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

no code yet • 9 Apr 2024

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images.

Paper
Add Code

3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching

no code yet • 1 Apr 2024

Optical-SAR image matching is a fundamental task for image fusion and visual navigation.

Paper
Add Code

GaussNav: Gaussian Splatting for Visual Navigation

no code yet • 18 Mar 2024

In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.

Paper
Add Code

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation

no code yet • 18 Mar 2024

We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference.

Paper
Add Code

VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training

no code yet • 12 Mar 2024

However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects -- not necessarily relevant to navigation and potentially misleading.

Paper
Add Code

A Landmark-Aware Visual Navigation Dataset

no code yet • 22 Feb 2024

However, recent advancements in the visual navigation field face challenges due to the lack of human datasets in the real world for efficient supervised representation learning of the environments.

Paper
Add Code

Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks

no code yet • 19 Feb 2024

Visual navigation requires a whole range of capabilities.

Paper
Add Code

Visual Navigation

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result