Measuring Association Between Labels and Free-Text Rationales

In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. While prior work focuses on extractive rationales (a subset of the input words), we investigate their less-studied counterpart: free-text natural language rationales. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales. We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established. We define label-rationale association as a necessary property for faithfulness: the internal mechanisms of the model producing the label and the rationale must be meaningfully correlated. We propose two measurements to test this property: robustness equivalence and feature importance agreement. We find that state-of-the-art T5-based joint models exhibit both properties for rationalizing commonsense question-answering and natural language inference, indicating their potential for producing faithful free-text rationales.

PDF Abstract EMNLP 2021 PDF EMNLP 2021 Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods