Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

ECCV 2018 Liang-Chieh ChenYukun ZhuGeorge PapandreouFlorian SchroffHartwig Adam

Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information... (read more)

PDF Abstract

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Uses extra
training data
Semantic Segmentation Cityscapes DeepLabv3+ (Xception-JFT) Mean IoU (class) 82.1% # 5
Image Classification ImageNet Modified Aligned Xception Top 1 Accuracy 79.81% # 25
Image Classification ImageNet Modified Aligned Xception Top 5 Accuracy 94.83% # 22
Semantic Segmentation PASCAL VOC 2012 DeepLabv3+ (Xception-JFT) Mean IoU 89.0% # 1