EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation

Current scene segmentation methods suffer from cumbersome model structures and high computational complexity, impeding their applications to real-world scenarios that require real-time processing. This paper proposes a novel Efficient Pyramid Representation Network (EPRNet), which strikes an innovative record on segmentation accuracy, model lightness and inference efficiency. Unlike existing methods delivering transfer learning based on pixel features of limited receptive fields encoded by shallow image classification backbones, EPRNet distributes multi-scale representations throughout the feature encoding flow to quickly enlarge and enrich receptive fields. Specifically, we introduce an extremely lightweight and efficient Multi-scale Processing Unit (MPU) that encodes multi-scale features through parallel convolutions of different kernels. By combining MPU and residual learning, we propose a core Pyramid Representation Module (PRM) to correctly acquire and aggregate region-based contexts in both shallow and deep layers. In this way, EPRNet can encode discriminative and comprehensive representations of multi-scale objects with a compact structure. We conduct extensive experiments on Cityscapes and CamVid datasets, demonstrating the superiority. Without any extra and coarse labeled data, EPRNet obtains mIoU 73.9% on the Cityscapes test set with only 0.9 million parameters at a speed of 42 FPS.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here