Visual Navigation
105 papers with code • 6 benchmarks • 16 datasets
Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.
Source: Vision-based Navigation Using Deep Reinforcement Learning
Libraries
Use these libraries to find Visual Navigation models and implementationsLatest papers with no code
TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability
Learning domain-independent visual representation is critical for enabling the trained DRL agent with the ability to generalize to unseen scenes and objects.
Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method
In this paper we have present an improved Cycle GAN based model for under water image enhancement.
Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision
Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes.
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images.
3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching
Optical-SAR image matching is a fundamental task for image fusion and visual navigation.
GaussNav: Gaussian Splatting for Visual Navigation
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation
We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference.
VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training
However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects -- not necessarily relevant to navigation and potentially misleading.
A Landmark-Aware Visual Navigation Dataset
However, recent advancements in the visual navigation field face challenges due to the lack of human datasets in the real world for efficient supervised representation learning of the environments.
Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks
Visual navigation requires a whole range of capabilities.