Search Results for author: Peter Hase

Found 20 papers, 17 papers with code

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

1 code implementation12 Jan 2024 Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe

In this paper, we present the surprising conclusion that current language models often generalize relatively well from easy to hard data, even performing as well as "oracle" models trained on hard data.

General Knowledge In-Context Learning +1

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

1 code implementation29 Sep 2023 Vaidehi Patil, Peter Hase, Mohit Bansal

Experimentally, we show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.

Model Editing

Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Personalization

1 code implementation15 Jun 2023 Swarnadeep Saha, Peter Hase, Mohit Bansal

We first show that teacher LLMs can indeed intervene on student reasoning to improve their performance.

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations

1 code implementation14 Nov 2022 Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal

We observe that (1) GPT-3 explanations are as grammatical as human explanations regardless of the hardness of the test samples, (2) for easy examples, GPT-3 generates highly supportive explanations but human explanations are more generalizable, and (3) for hard examples, human explanations are significantly better than GPT-3 explanations both in terms of label-supportiveness and generalizability judgements.

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

1 code implementation21 Sep 2022 Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal

We demonstrate that SP-Search effectively represents the generative process behind human summaries using modules that are typically faithful to their intended behavior.

Abstractive Text Summarization Sentence +1

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

1 code implementation22 Jun 2022 Zhuofan Ying, Peter Hase, Mohit Bansal

In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility).

Feature Importance Question Answering +2

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

2 code implementations14 Mar 2022 Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal

Providing natural language instructions in prompts is a useful new paradigm for improving task performance of large language models in a zero-shot setting.

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

1 code implementation26 Nov 2021 Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

1 code implementation1 Nov 2021 Prateek Yadav, Peter Hase, Mohit Bansal

Current approaches try to optimize for the cost incurred by users when adopting a recourse, but they assume that all users share the same cost function.

Fairness

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

1 code implementation NeurIPS 2021 Peter Hase, Harry Xie, Mohit Bansal

In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation.

counterfactual Feature Importance +2

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

1 code implementation LNLS (ACL) 2022 Peter Hase, Mohit Bansal

In order to carefully control important properties of the data and explanations, we introduce a synthetic dataset for experiments, and we also make use of three existing datasets with explanations: e-SNLI, TACRED, and SemEval.

Retrieval

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

1 code implementation ACL 2020 Peter Hase, Mohit Bansal

Through two kinds of simulation tests involving text and tabular data, we evaluate five explanations methods: (1) LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a Composite approach that combines explanations from each method.

counterfactual tabular-classification

Interpretable Image Recognition with Hierarchical Prototypes

1 code implementation25 Jun 2019 Peter Hase, Chaofan Chen, Oscar Li, Cynthia Rudin

Hence, we may find distinct explanations for the prediction an image receives at each level of the taxonomy.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.