As the demand for enabling high-level autonomous driving has increased in
recent years and visual perception is one of the critical features to enable
fully autonomous driving, in this paper, we introduce an efficient approach for
simultaneous object detection, depth estimation and pixel-level semantic
segmentation using a shared convolutional architecture. The proposed network
model, which we named Driving Scene Perception Network (DSPNet), uses
multi-level feature maps and multi-task learning to improve the accuracy and
efficiency of object detection, depth estimation and image segmentation tasks
from a single input image...
Hence, the resulting network model uses less than
850 MiB of GPU memory and achieves 14.0 fps on NVIDIA GeForce GTX 1080 with a
1024x512 input image, and both precision and efficiency have been improved over
combination of single tasks.