Neural networks trained to classify images do so by identifying features that allow them to distinguish between classes. These sets of features are either causal or context dependent. Grad-CAM is a popular method of visualizing both sets of features. In this paper, we formalize this feature divide and provide a methodology to extract causal features from Grad-CAM. We do so by defining context features as those features that allow contrast between predicted class and any contrast class. We then apply a set theoretic approach to separate causal from contrast features for COVID-19 CT scans. We show that on average, the image regions with the proposed causal features require 15% less bits when encoded using Huffman encoding, compared to Grad-CAM, for an average increase of 3% classification accuracy, over Grad-CAM. Moreover, we validate the transfer-ability of causal features between networks and comment on the non-human interpretable causal nature of current networks.