Search Results for author: Himabindu Lakkaraju

Found 64 papers, 21 papers with code

Robust Black Box Explanations Under Distribution Shift

no code implementations ICML 2020 Himabindu Lakkaraju, Nino Arsov, Osbert Bastani

As machine learning black boxes are increasingly being deployed in real-world applications, there has been a growing interest in developing post hoc explanations that summarize the behaviors of these black box models.

Manipulating Large Language Models to Increase Product Visibility

2 code implementations11 Apr 2024 Aounon Kumar, Himabindu Lakkaraju

We demonstrate that adding a strategic text sequence (STS) -- a carefully crafted message -- to a product's information page can significantly increase its likelihood of being listed as the LLM's top recommendation.

STS

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

no code implementations6 Apr 2024 Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju

Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive.

Data Poisoning Off-policy evaluation

Towards Safe and Aligned Large Language Models for Medicine

no code implementations6 Mar 2024 Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

The capabilities of large language models (LLMs) have been progressing at a breathtaking speed, leaving even their own developers grappling with the depth of their potential and risks.

General Knowledge

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

no code implementations27 Feb 2024 Zhenting Qi, HANLIN ZHANG, Eric Xing, Sham Kakade, Himabindu Lakkaraju

Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation.

Instruction Following Retrieval

OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

no code implementations20 Feb 2024 Jiaqi Ma, Vivian Lai, Yiming Zhang, Chacha Chen, Paul Hamilton, Davor Ljubenkov, Himabindu Lakkaraju, Chenhao Tan

However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies is complex; numerous design choices in the design space of user study lead to problems of reproducibility; and running user studies can be challenging and even daunting for machine learning researchers.

Decision Making Fairness

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

1 code implementation16 Feb 2024 Usha Bhalla, Alex Oesterling, Suraj Srinivas, Flavio P. Calmon, Himabindu Lakkaraju

CLIP embeddings have demonstrated remarkable performance across a wide range of computer vision tasks.

Model Editing

Towards Uncovering How Large Language Model Works: An Explainability Perspective

no code implementations16 Feb 2024 Haiyan Zhao, Fan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du

Large language models (LLMs) have led to breakthroughs in language tasks, yet the internal mechanisms that enable their remarkable generalization and reasoning abilities remain opaque.

Hallucination Language Modelling +3

Understanding the Effects of Iterative Prompting on Truthfulness

no code implementations9 Feb 2024 Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities.

Text Generation

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

no code implementations7 Feb 2024 Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju

We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness.

Decision Making

A Study on the Calibration of In-context Learning

no code implementations7 Dec 2023 HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).

In-Context Learning Natural Language Understanding +1

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

1 code implementation6 Nov 2023 Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs.

In-Context Unlearning: Language Models as Few Shot Unlearners

1 code implementation11 Oct 2023 Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju

In this work, we propose a new class of unlearning methods for LLMs we call ''In-Context Unlearning'', providing inputs in context and without having to update model parameters.

Machine Unlearning

Are Large Language Models Post Hoc Explainers?

1 code implementation9 Oct 2023 Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

To this end, several approaches have been proposed in recent literature to explain the behavior of complex predictive models in a post hoc fashion.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +1

On the Trade-offs between Adversarial Robustness and Actionable Explanations

no code implementations28 Sep 2023 Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.

Adversarial Robustness

Certifying LLM Safety against Adversarial Prompting

1 code implementation6 Sep 2023 Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju

We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block.

Adversarial Attack Language Modelling

Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage

no code implementations8 Aug 2023 Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju

We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR).

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

1 code implementation NeurIPS 2023 Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju

This strategy naturally combines the ease of use of post hoc explanations with the faithfulness of inherently interpretable models.

Attribute

Efficient Estimation of Average-Case Robustness for Multi-Class Classification

no code implementations26 Jul 2023 Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

These estimators linearize models in the local region around an input and analytically compute the robustness of the resulting linear models.

Multi-class Classification

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

no code implementations25 Jul 2023 Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju

Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks.

Question Answering

On Minimizing the Impact of Dataset Shifts on Actionable Explanations

no code implementations11 Jun 2023 Anna P. Meyer, Dan Ley, Suraj Srinivas, Himabindu Lakkaraju

To this end, we conduct rigorous theoretical analysis to demonstrate that model curvature, weight decay parameters while training, and the magnitude of the dataset shift are key factors that determine the extent of explanation (in)stability.

Consistent Explanations in the Face of Model Indeterminacy via Ensembling

no code implementations9 Jun 2023 Dan Ley, Leonard Tang, Matthew Nazari, Hongjin Lin, Suraj Srinivas, Himabindu Lakkaraju

This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy, which arises due to the existence of multiple (nearly) equally well-performing models for a given dataset and task.

Word-Level Explanations for Analyzing Bias in Text-to-Image Models

no code implementations3 Jun 2023 Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju

We introduce a method for computing scores for each word in the prompt; these scores represent its influence on biases in the model's output.

Sentence

Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

no code implementations8 Feb 2023 Satyapriya Krishna, Jiaqi Ma, Himabindu Lakkaraju

The Right to Explanation and the Right to be Forgotten are two important principles outlined to regulate algorithmic decision making and data usage in real-world applications.

Decision Making

On the Privacy Risks of Algorithmic Recourse

1 code implementation10 Nov 2022 Martin Pawelczyk, Himabindu Lakkaraju, Seth Neel

As predictive models are increasingly being employed to make consequential decisions, there is a growing emphasis on developing techniques that can provide algorithmic recourse to affected individuals.

Towards Robust Off-Policy Evaluation via Human Inputs

no code implementations18 Sep 2022 Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju

When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes.

Multi-Armed Bandits Off-policy evaluation

Evaluating Explainability for Graph Neural Networks

1 code implementation19 Aug 2022 Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik

As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.

TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations

1 code implementation8 Jul 2022 Dylan Slack, Satyapriya Krishna, Himabindu Lakkaraju, Sameer Singh

In real-world evaluations with humans, 73% of healthcare workers (e. g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems for explainability in a disease prediction task, and 85% of ML professionals agreed TalkToModel was easier to use for computing explanations.

BIG-bench Machine Learning Disease Prediction +1

OpenXAI: Towards a Transparent Evaluation of Model Explanations

2 code implementations22 Jun 2022 Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets.

Benchmarking Explainable Artificial Intelligence (XAI) +1

Efficiently Training Low-Curvature Neural Networks

2 code implementations14 Jun 2022 Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, Francois Fleuret

To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers.

Adversarial Robustness

A Human-Centric Take on Model Monitoring

no code implementations6 Jun 2022 Murtuza N Shergadwala, Himabindu Lakkaraju, Krishnaram Kenthapadi

Predictive models are increasingly used to make various consequential decisions in high-stakes domains such as healthcare, finance, and policy.

Fairness

Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations

1 code implementation2 Jun 2022 Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.

Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations

no code implementations15 May 2022 Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju

We then leverage these properties to propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods.

Decision Making Fairness

Rethinking Stability for Attribution-based Explanations

no code implementations14 Mar 2022 Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.

Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse

3 code implementations13 Mar 2022 Martin Pawelczyk, Teresa Datta, Johannes van-den-Heuvel, Gjergji Kasneci, Himabindu Lakkaraju

To this end, we propose a novel objective function which simultaneously minimizes the gap between the achieved (resulting) and desired recourse invalidation rates, minimizes recourse costs, and also ensures that the resulting recourse achieves a positive model prediction.

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

no code implementations3 Feb 2022 Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju

To this end, we first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding.

BIG-bench Machine Learning

Rethinking Explainability as a Dialogue: A Practitioner's Perspective

1 code implementation3 Feb 2022 Himabindu Lakkaraju, Dylan Slack, Yuxin Chen, Chenhao Tan, Sameer Singh

Overall, we hope our work serves as a starting place for researchers and engineers to design interactive explainability systems.

BIG-bench Machine Learning

What will it take to generate fairness-preserving explanations?

no code implementations24 Jun 2021 Jessica Dai, Sohini Upadhyay, Stephen H. Bach, Himabindu Lakkaraju

In situations where explanations of black-box models may be useful, the fairness of the black-box is also often a relevant concern.

Fairness

Feature Attributions and Counterfactual Explanations Can Be Manipulated

no code implementations23 Jun 2021 Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju

As machine learning models are increasingly used in critical decision-making settings (e. g., healthcare, finance), there has been a growing emphasis on developing methods to explain model predictions.

BIG-bench Machine Learning counterfactual +1

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

no code implementations18 Jun 2021 Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju

As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice.

counterfactual Counterfactual Explanation

Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods

no code implementations16 Jun 2021 Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju

As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models.

Fairness

Counterfactual Explanations Can Be Manipulated

no code implementations NeurIPS 2021 Dylan Slack, Sophie Hilgard, Himabindu Lakkaraju, Sameer Singh

In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated.

counterfactual Counterfactual Explanation +1

Learning Under Adversarial and Interventional Shifts

no code implementations29 Mar 2021 Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju

Most of the existing work focuses on optimizing for either adversarial shifts or interventional shifts.

Towards Robust and Reliable Algorithmic Recourse

no code implementations NeurIPS 2021 Sohini Upadhyay, Shalmali Joshi, Himabindu Lakkaraju

To address this problem, we propose a novel framework, RObust Algorithmic Recourse (ROAR), that leverages adversarial training for finding recourses that are robust to model shifts.

Decision Making

Towards a Unified Framework for Fair and Stable Graph Representation Learning

3 code implementations25 Feb 2021 Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik

In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations.

counterfactual Fairness +1

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

no code implementations21 Feb 2021 Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner.

Algorithmic Recourse in the Wild: Understanding the Impact of Data and Model Shifts

no code implementations22 Dec 2020 Kaivalya Rawal, Ece Kamar, Himabindu Lakkaraju

Our theoretical results establish a lower bound on the probability of recourse invalidation due to model shifts, and show the existence of a tradeoff between this invalidation probability and typical notions of "cost" minimized by modern recourse generation algorithms.

Does Fair Ranking Improve Minority Outcomes? Understanding the Interplay of Human and Algorithmic Biases in Online Hiring

no code implementations1 Dec 2020 Tom Sühr, Sophie Hilgard, Himabindu Lakkaraju

In this work, we analyze various sources of gender biases in online hiring platforms, including the job context and inherent biases of employers and establish how these factors interact with ranking algorithms to affect hiring decisions.

Learning Models for Actionable Recourse

1 code implementation NeurIPS 2021 Alexis Ross, Himabindu Lakkaraju, Osbert Bastani

As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations.

counterfactual Decision Making

When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making

no code implementations12 Nov 2020 Sean McGrath, Parth Mehta, Alexandra Zytek, Isaac Lage, Himabindu Lakkaraju

As machine learning (ML) models are increasingly being employed to assist human decision makers, it becomes critical to provide these decision makers with relevant inputs which can help them decide if and how to incorporate model predictions into their decision making.

Decision Making

Robust and Stable Black Box Explanations

no code implementations12 Nov 2020 Himabindu Lakkaraju, Nino Arsov, Osbert Bastani

To the best of our knowledge, this work makes the first attempt at generating post hoc explanations that are robust to a general class of adversarial perturbations that are of practical interest.

Incorporating Interpretable Output Constraints in Bayesian Neural Networks

1 code implementation NeurIPS 2020 Wanqian Yang, Lars Lorch, Moritz A. Graule, Himabindu Lakkaraju, Finale Doshi-Velez

Domains where supervised models are deployed often come with task-specific constraints, such as prior expert knowledge on the ground-truth function, or desiderata like safety and fairness.

Fairness Uncertainty Quantification

Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses

1 code implementation NeurIPS 2020 Kaivalya Rawal, Himabindu Lakkaraju

As predictive models are increasingly being deployed in high-stakes decision-making, there has been a lot of interest in developing algorithms which can provide recourses to affected individuals.

counterfactual Decision Making

Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

1 code implementation NeurIPS 2021 Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju

In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty.

Feature Importance

Fair Influence Maximization: A Welfare Optimization Approach

no code implementations14 Jun 2020 Aida Rahmattalabi, Shahin Jabbari, Himabindu Lakkaraju, Phebe Vayanos, Max Izenberg, Ryan Brown, Eric Rice, Milind Tambe

Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequality aversion design parameter.

Fairness Management

"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

no code implementations15 Nov 2019 Himabindu Lakkaraju, Osbert Bastani

Our work is the first to empirically establish how user trust in black box models can be manipulated via misleading explanations.

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods

2 code implementations6 Nov 2019 Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju

Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous.

Interpretable & Explorable Approximations of Black Box Models

no code implementations4 Jul 2017 Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec

To the best of our knowledge, this is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences.

Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making

no code implementations NeurIPS 2016 Himabindu Lakkaraju, Jure Leskovec

We propose Confusions over Time (CoT), a novel generative framework which facilitates a multi-granular analysis of the decision making process.

Decision Making

Learning Cost-Effective and Interpretable Regimes for Treatment Recommendation

no code implementations23 Nov 2016 Himabindu Lakkaraju, Cynthia Rudin

We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments.

Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

no code implementations28 Oct 2016 Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz

Predictive models deployed in the real world may assign incorrect labels to instances with high confidence.

Learning Cost-Effective Treatment Regimes using Markov Decision Processes

no code implementations21 Oct 2016 Himabindu Lakkaraju, Cynthia Rudin

We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments.

Cannot find the paper you are looking for? You can Submit a new open access paper.