Attention-Guided Hierarchical Structure Aggregation for Image Matting

Existing deep learning based matting algorithms primarily resort to high-level semantic features to improve the overall structure of alpha mattes. However, we argue that advanced semantics extracted from CNNs contribute unequally for alpha perception and we are supposed to reconcile advanced semantic information with low-level appearance cues to refine the foreground details. In this paper, we propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict the better structure of alpha mattes from single RGB images without additional input. Specifically, we employ spatial and channel-wise attention to integrate appearance cues and pyramidal features in a novel fashion. This blended attention mechanism can perceive alpha mattes from refined boundaries and adaptive semantics. We also introduce a hybrid loss function fusing Structural SIMilarity (SSIM), Mean Square Error (MSE) and Adversarial loss to guide the network to further improve the overall foreground structure. Besides, we construct a large-scale image matting dataset comprised of 59,600 training images and 1000 test images (total 646 distinct foreground alpha mattes), which can further improve the robustness of our hierarchical structure aggregation model. Extensive experiments demonstrate that the proposed HAttMatting can capture sophisticated foreground structure and achieve state-of-the-art performance with single RGB images as input.

PDF Abstract

Datasets


Introduced in the Paper:

Distinctions-646

Used in the Paper:

Composition-1K P3M-10k AM-2k AM-2K

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Image Matting AM-2K HATT SAD 28.01 # 7
MSE 0.0055 # 6
MAD 0.0161 # 7
Image Matting P3M-10k HATT SAD 25.99 # 6
MSE 0.0054 # 5
MAD 0.0152 # 6

Methods


No methods listed for this paper. Add relevant methods here