AggMask: Exploring locally aggregated learning of mask representations for instance segmentation

1 Jan 2021 · Tao Wang, Jun Hao Liew, Yu Li, Yunpeng Chen, Jiashi Feng ·

Recently proposed one-stage instance segmentation models (\emph{e.g.}, SOLO) learn to directly predict location-specific object mask with fully-convolutional networks. They perform comparably well as the traditional two-stage Mask R-CNN model, yet enjoying much simpler architecture and higher efficiency. However, an intrinsic limitation of these models is that they tend to generate similar mask predictions for a single object at nearby locations, while most of them are directly discarded by non-maximum suppression, leading to a waste of some useful predictions that can supplement the final result. In this work, we aim to explore how the model can benefit from better leveraging the neighboring predictions while maintaining the architectural simplicity and efficiency. To this end, we develop a novel learning-based aggregation framework that learns to aggregate the neighboring predictions. Meanwhile, unlike original location-based masks, the segmentation model is implicitly supervised to learn location-aware \textit{mask representations} that encode the geometric structure of nearby objects and complements adjacent representations with context. Based on the aggregation framework, we further introduce a mask interpolation mechanism that enables sharing mask representations for nearby spatial locations, thus allowing the model to generate much fewer representations for computation and memory saving. We experimentally show that by simply augmenting the baseline model with our proposed aggregation framework, the instance segmentation performance is significantly improved. For instance, it improves a SOLO model with ResNet-101 backbone by 2.0 AP on the COCO benchmark, with only about 2\% increase of computation. {Code and models} are available at anonymous repository: {\url{https://github.com/advdfacd/AggMask}}.

PDF Abstract