SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Instance Segmentation COCO test-dev SipMask (ResNet-101, single-scale test) mask AP 38.1 # 51
AP50 60.2 # 24
AP75 40.8 # 20
APS 17.8 # 26
APM 40.8 # 22
APL 54.3 # 14
Real-time Instance Segmentation MSCOCO SipMask++ (ResNet-101, single-scale test) Frame (fps) 27.0 (Titan Xp) # 11
mask AP 35.4 # 5
AP50 55.6 # 6
AP75 37.6 # 5
APS 11.2 # 7
APM 38.3 # 3
APL 56.8 # 3
Real-time Instance Segmentation MSCOCO SipMask (ResNet-50, single-scale test) Frame (fps) 41.7 (Titan Xp) # 3
mask AP 31.2 # 11
AP50 51.9 # 9
AP75 32.3 # 9
APS 9.2 # 9
APM 33.6 # 9
APL 49.8 # 8
Real-time Instance Segmentation MSCOCO SipMask (ResNet-101, single-scale test) Frame (fps) 31.3 (Titan Xp) # 7
mask AP 32.8 # 10
AP50 53.4 # 8
AP75 34.3 # 8
APS 9.3 # 8
APM 35.6 # 7
APL 54.0 # 5
Video Instance Segmentation YouTube-VIS validation SipMask (ResNet-50, ms-train, single-scale test) mask AP 33.7 # 20
AP50 54.1 # 20
AP75 35.8 # 21
AR1 35.4 # 14
AR10 40.1 # 17
Video Instance Segmentation YouTube-VIS validation SipMask (ResNet-50, single-scale test) mask AP 32.5 # 22
AP50 53 # 21
AP75 33.3 # 23
AR1 33.5 # 17
AR10 38.9 # 18

Methods


No methods listed for this paper. Add relevant methods here