Visual Navigation
105 papers with code • 6 benchmarks • 16 datasets
Visual Navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only.
Source: Vision-based Navigation Using Deep Reinforcement Learning
Libraries
Use these libraries to find Visual Navigation models and implementationsLatest papers
Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language
We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance map for Visual Navigation with natural language query prompts.
Scaling Data Generation in Vision-and-Language Navigation
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.
Learning Navigational Visual Representations with Semantic Map Supervision
Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.
Online Self-Supervised Thermal Water Segmentation for Aerial Vehicles
We present a new method to adapt an RGB-trained water segmentation network to target-domain aerial thermal imagery using online self-supervision by leveraging texture and motion cues as supervisory signals.
The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes
Estimating camera motion in deformable scenes poses a complex and open research challenge.
HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years.
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling.
POPGym: Benchmarking Partially Observable Reinforcement Learning
Real world applications of Reinforcement Learning (RL) are often partially observable, thus requiring memory.
Learning by Asking for Embodied Visual Navigation and Task Completion
The research community has shown increasing interest in designing intelligent embodied agents that can assist humans in accomplishing tasks.