This paper introduces a fully deep learning approach to monocular SLAM, which
can perform simultaneous localization using a neural network for learning
visual odometry (L-VO) and dense 3D mapping. Dense 2D flow and a depth image
are generated from monocular images by sub-networks, which are then used by a
3D flow associated layer in the L-VO network to generate dense 3D flow...
this 3D flow, the dual-stream L-VO network can then predict the 6DOF relative
pose and furthermore reconstruct the vehicle trajectory. In order to learn the
correlation between motion directions, the Bivariate Gaussian modelling is
employed in the loss function. The L-VO network achieves an overall performance
of 2.68% for average translational error and 0.0143 deg/m for average
rotational error on the KITTI odometry benchmark. Moreover, the learned depth
is fully leveraged to generate a dense 3D map. As a result, an entire visual
SLAM system, that is, learning monocular odometry combined with dense 3D
mapping, is achieved.