Visual Navigation

105 papers with code • 6 benchmarks • 16 datasets

Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.

Source: Vision-based Navigation Using Deep Reinforcement Learning

Libraries

Use these libraries to find Visual Navigation models and implementations

Latest papers with no code

End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon

no code yet • 28 Sep 2023

The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input.

STERLING: Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience

no code yet • 26 Sep 2023

Terrain awareness, i. e., the ability to identify and distinguish different types of terrain, is a critical ability that robots must have to succeed at autonomous off-road navigation.

Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference Aligned Path Planning

no code yet • 18 Sep 2023

In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain references within the inertial, proprioceptive, and tactile domain.

Multi3DRefer: Grounding Text Description to Multiple 3D Objects

no code yet • ICCV 2023

We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions.

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

no code yet • ICCV 2023

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents

no code yet • ICCV 2023

Accomplishing household tasks requires to plan step-by-step actions considering the consequences of previous actions.

Multi-goal Audio-visual Navigation using Sound Direction Map

no code yet • 1 Aug 2023

However, there has been no proposal for a generalized navigation task combining these two types of tasks and using both visual and auditory information in a situation where multiple sound sources are goals.

ViNT: A Foundation Model for Visual Navigation

no code yet • 26 Jun 2023

In this paper, we describe the Visual Navigation Transformer (ViNT), a foundation model that aims to bring the success of general-purpose pre-trained models to vision-based robotic navigation.

CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments

no code yet • 6 Jun 2023

Audio-visual navigation of an agent towards locating an audio goal is a challenging task especially when the audio is sporadic or the environment is noisy.

SACSoN: Scalable Autonomous Control for Social Navigation

no code yet • 2 Jun 2023

By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.