Re-Examining Human Annotations for Interpretable NLP

10 Apr 2022  ·  Cheng-Han Chiang, Hung-Yi Lee ·

Explanation methods in Interpretable NLP often explain the model's decision by extracting evidence (rationale) from the input texts supporting the decision. Benchmark datasets for rationales have been released to evaluate how good the rationale is. The ground truth rationales in these datasets are often human annotations obtained via crowd-sourced websites. Valuable as these datasets are, the details on how those human annotations are obtained are often not clearly specified. We conduct comprehensive controlled experiments using crowd-sourced websites on two widely used datasets in Interpretable NLP to understand how those unsaid details can affect the annotation results. Specifically, we compare the annotation results obtained from recruiting workers satisfying different levels of qualification. We also provide high-quality workers with different instructions for completing the same underlying tasks. Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions. We further show that specific explanation methods perform better when evaluated using the ground truth rationales obtained by particular instructions. Based on these observations, we highlight the importance of providing complete details of the annotation process and call for careful interpretation of any experiment results obtained using those annotations.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here