We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.
The accumulated belief of the world enables the agent to track visited regions of the environment.
Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.
As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in  to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.
In this paper we study the problem of learning to learn at both training and test time in the context of visual navigation.