no code implementations • 20 Mar 2024 • Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman
Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands.
no code implementations • 23 Mar 2023 • Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson
We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term.
no code implementations • 14 Mar 2022 • Kweku Kwegyir-Aggrey, A. Feder Cooper, Jessica Dai, John Dickerson, Keegan Hines, Suresh Venkatasubramanian
We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds.
no code implementations • 14 Jun 2021 • Sahil Verma, John Dickerson, Keegan Hines
Counterfactual explanations (CFEs) are an emerging technique under the umbrella of interpretability of machine learning (ML) models.
1 code implementation • 7 Jun 2021 • Sahil Verma, Keegan Hines, John P. Dickerson
We propose a novel stochastic-control-based approach that generates sequential ARs, that is, ARs that allow x to move stochastically and sequentially across intermediate states to a final state x'.
no code implementations • 15 Aug 2019 • Anh Truong, Austin Walters, Jeremy Goodsitt, Keegan Hines, C. Bayan Bruss, Reza Farivar
There has been considerable growth and interest in industrial applications of machine learning (ML) in recent years.