Search Results for author: Keegan Hines

Found 6 papers, 1 papers with code

Defending Against Indirect Prompt Injection Attacks With Spotlighting

no code implementations20 Mar 2024 Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman

Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands.

Prompt Engineering

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

no code implementations23 Mar 2023 Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson

We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term.

Repairing Regressors for Fair Binary Classification at Any Decision Threshold

no code implementations14 Mar 2022 Kweku Kwegyir-Aggrey, A. Feder Cooper, Jessica Dai, John Dickerson, Keegan Hines, Suresh Venkatasubramanian

We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds.

Binary Classification Classification +1

Counterfactual Explanations for Machine Learning: Challenges Revisited

no code implementations14 Jun 2021 Sahil Verma, John Dickerson, Keegan Hines

Counterfactual explanations (CFEs) are an emerging technique under the umbrella of interpretability of machine learning (ML) models.

BIG-bench Machine Learning counterfactual

Amortized Generation of Sequential Algorithmic Recourses for Black-box Models

1 code implementation7 Jun 2021 Sahil Verma, Keegan Hines, John P. Dickerson

We propose a novel stochastic-control-based approach that generates sequential ARs, that is, ARs that allow x to move stochastically and sequentially across intermediate states to a final state x'.

Cannot find the paper you are looking for? You can Submit a new open access paper.