Pyramid Scene Parsing Network

Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

PDF Abstract CVPR 2017 PDF CVPR 2017 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Semantic Segmentation ADE20K PSPNet Validation mIoU 44.94 # 189
Test Score 55.38 # 7
Semantic Segmentation ADE20K PSPNet (ResNet-101) Validation mIoU 43.29 # 201
Semantic Segmentation ADE20K PSPNet (ResNet-152) Validation mIoU 43.51 # 200
Semantic Segmentation ADE20K val PSPNet (ResNet-101) mIoU 43.29% # 86
Semantic Segmentation ADE20K val PSPNet (ResNet-152) mIoU 43.51% # 85
Lesion Segmentation Anatomical Tracings of Lesions After Stroke (ATLAS) PSPNet Dice 0.3571 # 5
IoU 0.254 # 4
Precision 0.4769 # 5
Recall 0.3335 # 4
Real-Time Semantic Segmentation CamVid PSPNet Time (ms) 185.0 # 16
Frame (fps) 5.4 # 14
Semantic Segmentation Cityscapes test PSPNet Mean IoU (class) 78.4% # 58
Semantic Segmentation Cityscapes test PSPNet++ Mean IoU (class) 80.2% # 50
Video Semantic Segmentation Cityscapes val PSPNet-101 [20] mIoU 79.7 # 4
Video Semantic Segmentation Cityscapes val PSPNet-50 [20] mIoU 78.1 # 5
Semantic Segmentation Cityscapes val PSPNet (Dilated-ResNet-101) mIoU 79.7 # 48
Semantic Segmentation DADA-seg PSPNet (ResNet-101) mIoU 20.1 # 20
Semantic Segmentation DensePASS PSPNet (ResNet-50) mIoU 29.5% # 28
Dichotomous Image Segmentation DIS-TE1 PSPNet max F-Measure 0.645 # 12
weighted F-measure 0.557 # 11
MAE 0.089 # 9
S-Measure 0.725 # 11
E-measure 0.791 # 10
HCE 267 # 16
Dichotomous Image Segmentation DIS-TE2 PSPNet max F-Measure 0.724 # 10
weighted F-measure 0.636 # 9
MAE 0.092 # 9
S-Measure 0.763 # 10
E-measure 0.828 # 11
HCE 586 # 17
Dichotomous Image Segmentation DIS-TE3 PSPNet max F-Measure 0.747 # 13
weighted F-measure 0.657 # 13
MAE 0.092 # 12
S-Measure 0.774 # 11
E-measure 0.843 # 15
HCE 1111 # 18
Dichotomous Image Segmentation DIS-TE4 PSPNet max F-Measure 0.725 # 15
weighted F-measure 0.630 # 15
MAE 0.107 # 13
S-Measure 0.758 # 13
E-measure 0.815 # 17
HCE 3806 # 16
Dichotomous Image Segmentation DIS-VD PSPNet max F-Measure 0.691 # 13
weighted F-measure 0.603 # 12
MAE 0.102 # 9
S-Measure 0.744 # 12
E-measure 0.802 # 12
HCE 1588 # 16
Thermal Image Segmentation MFN Dataset PSPNet mIOU 46.1 # 38
Real-Time Semantic Segmentation NYU Depth v2 PSPNet101 mIoU 43.2 # 5
Speed(ms/f) 72 # 10
Real-Time Semantic Segmentation NYU Depth v2 PSPNet50 mIoU 41.8 # 7
Speed(ms/f) 47 # 9
Real-Time Semantic Segmentation NYU Depth v2 PSPNet18 mIoU 35.9 # 10
Speed(ms/f) 19 # 3
Semantic Segmentation PASCAL Context PSPNet (ResNet-101) mIoU 47.8 # 50
Semantic Segmentation PASCAL VOC 2012 test PSPNet (ResNet-101) Mean IoU 82.6% # 25
Semantic Segmentation PASCAL VOC 2012 test PSPNet Mean IoU 85.4% # 10
Semantic Segmentation ScanNetV2 PSPNet Mean IoU 47.5% # 9
Semantic Segmentation SELMA PSPNet mIoU 68.4 # 5
Semantic Segmentation Trans10K PSPNet mIoU 68.23% # 8
GFLOPs 187.03 # 14
Semantic Segmentation UrbanLF PSPNet mIoU (Real) 76.34 # 9
mIoU (Syn) 75.78 # 11

Methods