We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.
The accumulated belief of the world enables the agent to track visited regions of the environment.
We find that SRCC for Habitat as used for the CVPR19 challenge is low (0. 18 for the success metric), which suggests that performance improvements for this simulator-based challenge would not transfer well to a physical robot.
Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.
As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in  to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.
In this paper we study the problem of learning to learn at both training and test time in the context of visual navigation.
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.