Visual Navigation

105 papers with code • 6 benchmarks • 16 datasets

Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.

Source: Vision-based Navigation Using Deep Reinforcement Learning

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Navigation

Dataset	Best Model	Compare
R2R	Meta-Explore	See all
Cooperative Vision-and-Dialogue Navigation	NaviLLM	See all
SOON Test	AutoVLN	See all
AI2-THOR	MVV-IN	See all
Dmlab-30	PopArt-IMPALA	See all
Help, Anna! (HANNA)	Prevalent	See all

Libraries

Use these libraries to find Visual Navigation models and implementations

mchancan/citylearn

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

peteanderson80/Matterport3DSimulator • • CVPR 2018

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Paper
Code

Cognitive Mapping and Planning for Visual Navigation

tensorflow/models • • CVPR 2017

The accumulated belief of the world enables the agent to track visited regions of the environment.

Paper
Code

Think Locally, Act Globally: Federated Learning with Local and Global Representations

pliang279/LG-FedAvg • • 6 Jan 2020

To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices.

Paper
Code

A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

pulp-platform/pulp-dronet • • 4 May 2018

As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.

Paper
Code

Visual Representations for Semantic Target Driven Navigation

tensorflow/models • • 15 May 2018

We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.

Paper
Code

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

chihyaoma/regretful-agent • • CVPR 2019

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Paper
Code