We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds.
Understanding dynamic 3D environment is crucial for robotic agents and many other applications.
We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation.
By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.
Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods.
Making use of the estimated occlusions, we also show improved results on motion segmentation and scene flow estimation.
This paper introduces a first effort to apply a deep learning method for direct estimation of scene flow by presenting a fully convolutional neural network with an encoder-decoder (ED) architecture.