ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on "Stylized-ImageNet", a stylized version of ImageNet. This provides a much better fit for human behavioural performance in our well-controlled psychophysical lab setting (nine experiments totalling 48,560 psychophysical trials across 97 observers) and comes with a number of unexpected emergent benefits such as improved object detection performance and previously unseen robustness towards a wide range of image distortions, highlighting advantages of a shape-based representation.

PDF Abstract ICLR 2019 PDF ICLR 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Domain Generalization ImageNet-A Stylized ImageNet (ResNet-50) Top-1 accuracy % 2.3 # 38
Domain Generalization ImageNet-C Stylized ImageNet (ResNet-50) mean Corruption Error (mCE) 69.3 # 38
Domain Generalization ImageNet-R Stylized ImageNet (ResNet-50) Top-1 Error Rate 58.5 # 34
Out-of-Distribution Generalization ImageNet-W Style Transfer (ResNet-50) IN-W Gap -17.3 # 1
Carton Gap +52 # 1
Domain Generalization VizWiz-Classification ResNet-50 (SIN) Accuracy - All Images 25.3 # 86
Accuracy - Corrupted Images 20.4 # 85
Accuracy - Clean Images 30 # 86
Domain Generalization VizWiz-Classification ResNet-50 (SIN_IN_IN) Accuracy - All Images 39.2 # 40
Accuracy - Corrupted Images 32.4 # 42
Accuracy - Clean Images 44.6 # 36
Domain Generalization VizWiz-Classification ResNet-50 (SIN_IN) Accuracy - All Images 38.2 # 49
Accuracy - Corrupted Images 32.5 # 40
Accuracy - Clean Images 42.7 # 48