no code implementations • ICLR 2021 • Charles Lovering, Rohan Jha, Tal Linzen, Ellie Pavlick
In this work, we test the hypothesis that the extent to which a feature influences a model's decisions can be predicted using a combination of two factors: The feature's "extractability" after pre-training (measured using information-theoretic probing techniques), and the "evidence" available during fine-tuning (defined as the feature's co-occurrence rate with the label).
no code implementations • 30 Apr 2020 • Rohan Jha, Charles Lovering, Ellie Pavlick
Neural models often exploit superficial features to achieve good performance, rather than deriving more general features.