Visual Navigation

105 papers with code • 6 benchmarks • 16 datasets

Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.

Source: Vision-based Navigation Using Deep Reinforcement Learning

Libraries

Use these libraries to find Visual Navigation models and implementations

Latest papers with no code

TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability

no code yet • 12 Apr 2024

Learning domain-independent visual representation is critical for enabling the trained DRL agent with the ability to generalize to unseen scenes and objects.

Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method

no code yet • 11 Apr 2024

In this paper we have present an improved Cycle GAN based model for under water image enhancement.

Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision

no code yet • 10 Apr 2024

Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes.

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

no code yet • 9 Apr 2024

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images.

3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching

no code yet • 1 Apr 2024

Optical-SAR image matching is a fundamental task for image fusion and visual navigation.

GaussNav: Gaussian Splatting for Visual Navigation

no code yet • 18 Mar 2024

In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation

no code yet • 18 Mar 2024

We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference.

VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training

no code yet • 12 Mar 2024

However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects -- not necessarily relevant to navigation and potentially misleading.

A Landmark-Aware Visual Navigation Dataset

no code yet • 22 Feb 2024

However, recent advancements in the visual navigation field face challenges due to the lack of human datasets in the real world for efficient supervised representation learning of the environments.

Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks

no code yet • 19 Feb 2024

Visual navigation requires a whole range of capabilities.