1 code implementation • 6 Oct 2024 • Shramay Palta, Nishant Balepur, Peter Rankel, Sarah Wiegreffe, Marine Carpuat, Rachel Rudinger
Questions involving commonsense reasoning about everyday situations often admit many $\textit{possible}$ or $\textit{plausible}$ answers.
no code implementations • 21 Jul 2024 • Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov, Hannaneh Hajishirzi, Ashish Sabharwal
Multiple-choice question answering (MCQA) is a key competence of performant transformer language models that is tested by mainstream benchmarks.
1 code implementation • 2 Jul 2024 • Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
Chat-based language models are designed to be helpful, yet they should not comply with every user request.
1 code implementation • 12 Jan 2024 • Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe
In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data, even performing as well as oracle models finetuned on hard data.
no code implementations • 16 Nov 2023 • Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. Smith
Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e. g., the hypothesis in NLI) and the label; consequently, models trained only on those outperform chance.
1 code implementation • 24 May 2023 • Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket Tandon
However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer.
1 code implementation • 24 May 2023 • Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal
For example, both normalization and prompting methods for reducing SFC can be ineffective or even detrimental to task performance for some LMs.
3 code implementations • NeurIPS 2023 • Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark
Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
no code implementations • 16 Apr 2022 • Kaige Xie, Sarah Wiegreffe, Mark Riedl
We show that decomposition is an effective form of probing QA systems as well as a promising approach to explanation generation.
1 code implementation • NAACL 2022 • Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, Yejin Choi
We create a pipeline that combines GPT-3 with a supervised filter that incorporates binary acceptability judgments from humans in the loop.
1 code implementation • 4 May 2021 • Xiangyu Peng, Siyan Li, Sarah Wiegreffe, Mark Riedl
Transformer-based language model approaches to automated story generation currently provide state-of-the-art results.
no code implementations • 24 Feb 2021 • Sarah Wiegreffe, Ana Marasović
Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated textual explanations.
1 code implementation • EMNLP 2021 • Sarah Wiegreffe, Ana Marasović, Noah A. Smith
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance.
2 code implementations • ACL 2020 • Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace
In NLP this often entails extracting snippets of an input text `responsible for' corresponding model output; when such a snippet comprises tokens that indeed informed the model's prediction, it is a faithful explanation.
2 code implementations • IJCNLP 2019 • Sarah Wiegreffe, Yuval Pinter
We show that even when reliable adversarial distributions can be found, they don't perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.
no code implementations • WS 2019 • Sarah Wiegreffe, Edward Choi, Sherry Yan, Jimeng Sun, Jacob Eisenstein
The text of clinical notes can be a valuable source of patient information and clinical assessments.
3 code implementations • NAACL 2018 • James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, Jacob Eisenstein
Our method aggregates information across the document using a convolutional neural network, and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes.
Ranked #11 on
Medical Code Prediction
on MIMIC-III