Surprisingly, we find that pruned LLMs hallucinate less compared to their full-sized counterparts.
Explanation faithfulness of model predictions in natural language processing is typically evaluated on held-out data from the same temporal distribution as the training data (i. e. synchronous settings).
Recent work in Natural Language Processing has focused on developing approaches that extract faithful explanations, either via identifying the most important tokens in the input (i. e. post-hoc explanations) or by designing inherently faithful models that first select the most important tokens and then use them to predict the correct label (i. e. select-then-predict models).
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations.
In this paper, we hypothesize that salient information extracted a priori from the training data can complement the task-specific information learned by the model during fine-tuning on a downstream task.
In this paper, we seek to improve the faithfulness of attention-based explanations for text classification.
Recent research on model interpretability in natural language processing extensively uses feature scoring methods for identifying which parts of the input are the most important for a model to make a prediction (i. e. explanation or rationale).