Rethinking Dilated Convolution for Real-time Semantic Segmentation

18 Nov 2021  ·  Roland Gao ·

The field-of-view is an important metric when designing a model for semantic segmentation. To obtain a large field-of-view, previous approaches generally choose to rapidly downsample the resolution, usually with average poolings or stride 2 convolutions. We take a different approach by using dilated convolutions with large dilation rates throughout the backbone, allowing the backbone to easily tune its field-of-view by adjusting its dilation rates, and show that it's competitive with existing approaches. To effectively use the dilated convolution, we show a simple upper bound on the dilation rate in order to not leave gaps in between the convolutional weights, and design an SE-ResNeXt inspired block structure that uses two parallel $3\times 3$ convolutions with different dilation rates to preserve the local details. Manually tuning the dilation rates for every block can be difficult, so we also introduce a differentiable neural architecture search method that uses gradient descent to optimize the dilation rates. In addition, we propose a lightweight decoder that restores local information better than common alternatives. To demonstrate the effectiveness of our approach, our model RegSeg achieves competitive results on real-time Cityscapes and CamVid datasets. Using a T4 GPU with mixed precision, RegSeg achieves 78.3 mIOU on Cityscapes test set at $37$ FPS, and 80.9 mIOU on CamVid test set at $112$ FPS, both without ImageNet pretraining.

PDF Abstract

Results from the Paper


Ranked #3 on Real-Time Semantic Segmentation on CamVid (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Real-Time Semantic Segmentation CamVid RegSeg(Cityscapes-Pretrained) mIoU 80.9 # 3
Time (ms) 14 # 7
Frame (fps) 70 # 7
Real-Time Semantic Segmentation Cityscapes test RegSeg (no ImageNet pretraining) mIoU 78.3% # 5
Time (ms) 33 # 18
Frame (fps) 30 # 20

Methods