Unity of Opposites: SelfNorm and CrossNorm for Model Robustness

1 Jan 2021 · Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris N. Metaxas ·

Studies have demonstrated the fundamental behavioral difference between human vision and Convolutional Neural Networks (CNNs). Human vision can make robust object recognition mainly using object shape (content). In contrast, CNNs are not only texture-biased but also sensitive to texture (style) noises, giving rise to security concerns in practice. To reduce CNNs' texture sensitivity and bias, this paper proposes SelfNorm and CrossNorm, respectively. SelfNorm uses attention to recalibrate feature styles by emphasizing primary styles and suppressing trivial ones. CrossNorm exchanges styles between feature channels to perform style augmentation, diversifying the content and style mixtures. SelfNorm and CrossNorm explore the opposite directions in utilizing style. Nevertheless, they complement each other and seek the same goal: model robustness. Surprisingly, they can boost the performance of corrupted and clean test data simultaneously. Extensive experiments on different tasks (classification and segmentation) and settings (supervised and semi-supervised) show their effectiveness.

PDF Abstract