Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e. g., the hypothesis in NLI) and the label; consequently, models trained only on those outperform chance.
However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer.
For example, both normalization and prompting methods for reducing SFC can be ineffective or even detrimental to task performance for some LMs.
1 code implementation • • Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark
Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
We show that decomposition is an effective form of probing QA systems as well as a promising approach to explanation generation.
We create a pipeline that combines GPT-3 with a supervised filter that incorporates binary acceptability judgments from humans in the loop.
Transformer-based language model approaches to automated story generation currently provide state-of-the-art results.
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance.
In NLP this often entails extracting snippets of an input text `responsible for' corresponding model output; when such a snippet comprises tokens that indeed informed the model's prediction, it is a faithful explanation.
We show that even when reliable adversarial distributions can be found, they don't perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.
Our method aggregates information across the document using a convolutional neural network, and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes.
Ranked #10 on Medical Code Prediction on MIMIC-III