Temporal Memory Attention for Video Semantic Segmentation

17 Feb 2021  ·  Hao Wang, Weining Wang, Jing Liu ·

Video semantic segmentation requires to utilize the complex temporal relations between frames of the video sequence. Previous works usually exploit accurate optical flow to leverage the temporal relations, which suffer much from heavy computational cost. In this paper, we propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Specially, we construct a memory using several past frames to store the temporal information of the current frame. We then propose a temporal memory attention module to capture the relation between the current frame and the memory to enhance the representation of the current frame. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Video Semantic Segmentation CamVid GRFP [15] Mean IoU 67.1 # 6
Video Semantic Segmentation CamVid PSPNet-101 [20] Mean IoU 76.5 # 1
Video Semantic Segmentation CamVid PSPNet-50 [20] Mean IoU 76 # 4
Video Semantic Segmentation CamVid TDNet-50 [9] Mean IoU 76.2 # 3
Video Semantic Segmentation CamVid Netwarp [7] Mean IoU 74.7 # 5
Video Semantic Segmentation Cityscapes val TMANet-50 mIoU 80.3 # 1
Semantic Segmentation UrbanLF TMANet mIoU (Real) 77.14 # 7
mIoU (Syn) 76.41 # 10


No methods listed for this paper. Add relevant methods here