In this paper, we study the sensitivity of CNN outputs with respect to image
transformations and noise in the area of fine-grained recognition. In
particular, we answer the following questions (1) how sensitive are CNNs with
respect to image transformations encountered during wild image capture?; (2)
how can we predict CNN sensitivity?; and (3) can we increase the robustness of
CNNs with respect to image degradations? To answer the first question, we
provide an extensive empirical sensitivity analysis of commonly used CNN
architectures (AlexNet, VGG19, GoogleNet) across various types of image
degradations. This allows for predicting CNN performance for new domains
comprised by images of lower quality or captured from a different viewpoint. We
also show how the sensitivity of CNN outputs can be predicted for single
images. Furthermore, we demonstrate that input layer dropout or pre-filtering
during test time only reduces CNN sensitivity for high levels of degradation.
Experiments for fine-grained recognition tasks reveal that VGG19 is more
robust to severe image degradations than AlexNet and GoogleNet. However, small
intensity noise can lead to dramatic changes in CNN performance even for VGG19.