Visual Navigation

105 papers with code • 6 benchmarks • 16 datasets

Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.

Source: Vision-based Navigation Using Deep Reinforcement Learning

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Navigation

Dataset	Best Model	Compare
R2R	Meta-Explore	See all
Cooperative Vision-and-Dialogue Navigation	NaviLLM	See all
SOON Test	AutoVLN	See all
AI2-THOR	MVV-IN	See all
Dmlab-30	PopArt-IMPALA	See all
Help, Anna! (HANNA)	Prevalent	See all

Libraries

Use these libraries to find Visual Navigation models and implementations

mchancan/citylearn

2 papers

Datasets

Latest papers with no code

Most implemented Social Latest No code

End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon

no code yet • 28 Sep 2023

The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input.

Paper
Add Code

STERLING: Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience

no code yet • 26 Sep 2023

Terrain awareness, i. e., the ability to identify and distinguish different types of terrain, is a critical ability that robots must have to succeed at autonomous off-road navigation.

Paper
Add Code

Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference Aligned Path Planning

no code yet • 18 Sep 2023

In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain references within the inertial, proprioceptive, and tactile domain.

Paper
Add Code

Multi3DRefer: Grounding Text Description to Multiple 3D Objects

no code yet • ICCV 2023

We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions.

Paper
Add Code

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

no code yet • ICCV 2023

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Paper
Add Code

Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents

no code yet • ICCV 2023

Accomplishing household tasks requires to plan step-by-step actions considering the consequences of previous actions.

Paper
Add Code

Multi-goal Audio-visual Navigation using Sound Direction Map

no code yet • 1 Aug 2023

However, there has been no proposal for a generalized navigation task combining these two types of tasks and using both visual and auditory information in a situation where multiple sound sources are goals.

Paper
Add Code

ViNT: A Foundation Model for Visual Navigation

no code yet • 26 Jun 2023

In this paper, we describe the Visual Navigation Transformer (ViNT), a foundation model that aims to bring the success of general-purpose pre-trained models to vision-based robotic navigation.

Paper
Add Code

CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments

no code yet • 6 Jun 2023

Audio-visual navigation of an agent towards locating an audio goal is a challenging task especially when the audio is sporadic or the environment is noisy.

Paper
Add Code

SACSoN: Scalable Autonomous Control for Social Navigation

no code yet • 2 Jun 2023

By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.

Paper
Add Code

Visual Navigation

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result