ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

17 Jul 2024  ·  Carlos Hinojosa, Shuming Liu, Bernard Ghanem ·

Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework, offering remarkable performance across a wide range of downstream tasks. To increase the difficulty of the pretext task and learn richer visual representations, existing works have focused on replacing standard random masking with more sophisticated strategies, such as adversarial-guided and teacher-guided masking. However, these strategies depend on the input data thus commonly increasing the model complexity and requiring additional calculations to generate the mask patterns. This raises the question: Can we enhance MAE performance beyond random masking without relying on input data or incurring additional computational costs? In this work, we introduce a simple yet effective data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. Drawing inspiration from color noise in image processing, we explore four types of filters to yield mask patterns with different spatial and semantic priors. ColorMAE requires no additional learnable parameters or computational overhead in the network, yet it significantly enhances the learned representations. We provide a comprehensive empirical evaluation, demonstrating our strategy's superiority in downstream tasks compared to random masking. Notably, we report an improvement of 2.72 in mIoU in semantic segmentation tasks relative to baseline MAE implementations.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semantic Segmentation ADE20K ColorMAE-Green-ViTB-1600 Validation mIoU 49.3 # 129
Instance Segmentation COCO ColorMAE-Green-ViTB-1600 maskAP 44.4 # 1
maskAP50 67.8 # 1
maskAP75 48 # 1
Object Detection COCO ColorMAE-Green-ViTB-1600 boxAP 50.1 # 1
boxAP50 70.7 # 1
boxAP75 54.7 # 1
Image Classification ImageNet ColorMAE-Green-ViTB-1600 Top 1 Accuracy 83.8% # 384

Methods