Ego-Centric Spatial Memory Networks

Egocentric spatial memory is cornerstone to our understanding of biological perception. Spatial memory is also essential for embodied AI systems acting from image sensors in partially observable (PO) environments. For image-to-action learning, the most common approach for dealing with PO is to incorporate general memory architectures, which do not explicitly leverage any spatial inductive bias. Examples are Long Short-Term Memory, self-attention, or differentiable external memory. While other works have demonstrated the benefits of a learnt spatial memory, these are typically targeted at 2D navigation, and encode the memory into aerial 2D maps. We propose Egocentric Spatial Memory Networks (ESMN), which encode the memory in an ego-sphere around the agent, enabling more expressive 3D representations. ESMN improves both training efficiency and final performance against other memory baselines on both drone and manipulator visuomotor control tasks. We see a benefit for both body-fixed and scene-fixed cameras, and show that ESMN can be trained via either imitation or reinforcement learning. The explicit ego-centric geometry also enables us to seamlessly combine the learnt controller with other non-learnt modalities, such as local obstacle avoidance, which is not possible using less structured memory. We further show applications to semantic segmentation on the ScanNet dataset, where ESMN naturally combines image-level and map-level inference modalities. Finally, when combined with a visual tracker, we show that the module can operate as a real-time real-world mapping system, reaching 30Hz on either GPU or CPU, and producing high-quality and robust ego-centric maps. With our broad demonstrations, we show that ESMN represents a useful and general computation graph for embodied spatial reasoning, and the module forms a bridge between real-time mapping systems and differentiable memory architectures.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here