Hierarchical Memory Matching Network for Video Object Segmentation

We present Hierarchical Memory Matching Network (HMMN) for semi-supervised video object segmentation. Based on a recent memory-based method [33], we propose two advanced memory read modules that enable us to perform memory reading in multiple scales while exploiting temporal smoothness. We first propose a kernel guided memory matching module that replaces the non-local dense memory read, commonly adopted in previous memory-based methods. The module imposes the temporal smoothness constraint in the memory read, leading to accurate memory retrieval. More importantly, we introduce a hierarchical memory matching scheme and propose a top-k guided memory matching module in which memory read on a fine-scale is guided by that on a coarse-scale. With the module, we perform memory read in multiple scales efficiently and leverage both high-level semantic and low-level fine-grained memory features to predict detailed object masks. Our network achieves state-of-the-art performance on the validation sets of DAVIS 2016/2017 (90.8% and 84.7%) and YouTube-VOS 2018/2019 (82.6% and 82.5%), and test-dev set of DAVIS 2017 (78.6%). The source code and model are available online: https://github.com/Hongje/HMMN.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Semi-Supervised Video Object Segmentation DAVIS 2016 HMMN Jaccard (Mean) 89.6 # 24
F-measure (Mean) 92.0 # 25
J&F 90.8 # 24
Semi-Supervised Video Object Segmentation DAVIS 2017 (test-dev) HMMN J&F 78.6 # 23
Jaccard (Mean) 74.7 # 23
F-measure (Mean) 82.5 # 23
Semi-Supervised Video Object Segmentation DAVIS 2017 (val) HMMN Jaccard (Mean) 81.9 # 24
F-measure (Mean) 87.5 # 25
J&F 84.7 # 25
Semi-Supervised Video Object Segmentation DAVIS (no YouTube-VOS training) HMMN FPS 10.0 # 13
D16 val (G) 89.4 # 1
D16 val (J) 88.2 # 1
D16 val (F) 90.6 # 1
D17 val (G) 80.4 # 1
D17 val (J) 77.7 # 1
D17 val (F) 83.1 # 1
Semi-Supervised Video Object Segmentation YouTube-VOS 2018 HMMN F-Measure (Seen) 87.0 # 30
F-Measure (Unseen) 84.6 # 33
Overall 82.6 # 31
Jaccard (Seen) 82.1 # 30
Jaccard (Unseen) 76.8 # 32

Methods


No methods listed for this paper. Add relevant methods here