Search Results for author: Himabindu Lakkaraju

Found 82 papers, 35 papers with code

Robust Black Box Explanations Under Distribution Shift

no code implementations ICML 2020 Himabindu Lakkaraju, Nino Arsov, Osbert Bastani

As machine learning black boxes are increasingly being deployed in real-world applications, there has been a growing interest in developing post hoc explanations that summarize the behaviors of these black box models.

How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior

1 code implementation21 May 2025 Zidi Xiong, Yuping Lin, Wenya Xie, Pengfei He, Jiliang Tang, Himabindu Lakkaraju, Zhen Xiang

In this paper, we conduct an empirical study on how memory management choices impact the LLM agents' behavior, especially their long-term performance.

Large Language Model Management

Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations

1 code implementation21 May 2025 Aaron J. Li, Suraj Srinivas, Usha Bhalla, Himabindu Lakkaraju

Sparse autoencoders (SAEs) are commonly used to interpret the internal activations of large language models (LLMs) by mapping them to human-interpretable concept representations.

Disentanglement

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

no code implementations19 May 2025 Zidi Xiong, Chen Shan, Zhenting Qi, Himabindu Lakkaraju

Large Reasoning Models (LRMs) have significantly enhanced their capabilities in complex problem-solving by introducing a thinking draft that enables multi-path Chain-of-Thought explorations before producing final answers.

counterfactual

Soft Best-of-n Sampling for Model Alignment

no code implementations6 May 2025 Claudio Mayrink Verdun, Alex Oesterling, Himabindu Lakkaraju, Flavio P. Calmon

BoN yields high reward values in practice at a distortion cost, as measured by the KL-divergence between the sampled and original distribution.

Language Modeling Language Modelling

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

no code implementations3 Apr 2025 Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Himabindu Lakkaraju, Yizhou Sun, Shichang Zhang

Post-training is essential for the success of large language models (LLMs), transforming pre-trained base models into more useful and aligned post-trained models.

Towards Interpretable Soft Prompts

1 code implementation2 Apr 2025 Oam Patel, Jason Wang, Nikhil Shivakumar Nayak, Suraj Srinivas, Himabindu Lakkaraju

Instead, our framework inspires a new direction of trainable prompting methods that explicitly optimizes for interpretability.

Detecting LLM-Generated Peer Reviews

1 code implementation20 Mar 2025 Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, Nihar B. Shah

We also empirically find that our approach is resilient to common reviewer defenses, and that the bounds on error rates in our statistical tests hold in practice.

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

no code implementations31 Dec 2024 Martin Pawelczyk, Lillian Sun, Zhenting Qi, Aounon Kumar, Himabindu Lakkaraju

A key phenomenon known as weak-to-strong generalization - where a strong model trained on a weak model's outputs surpasses the weak model in task performance - has gained significant attention.

Fairness

On the Impact of Fine-Tuning on Chain-of-Thought Reasoning

no code implementations22 Nov 2024 Elita Lobo, Chirag Agarwal, Himabindu Lakkaraju

Large language models have emerged as powerful tools for general intelligence, showcasing advanced natural language processing capabilities that find applications across diverse domains.

Towards Unifying Interpretability and Control: Evaluation via Intervention

1 code implementation7 Nov 2024 Usha Bhalla, Suraj Srinivas, Asma Ghandeharioun, Himabindu Lakkaraju

We introduce two new evaluation metrics: intervention success rate and the coherence-intervention tradeoff, designed to measure the accuracy of explanations and their utility in controlling model behavior.

Generalized Group Data Attribution

no code implementations13 Oct 2024 Dan Ley, Suraj Srinivas, Shichang Zhang, Gili Rusak, Himabindu Lakkaraju

Data Attribution (DA) methods quantify the influence of individual training data points on model outputs and have broad applications such as explainability, data selection, and noisy label identification.

Computational Efficiency

Quantifying Generalization Complexity for Large Language Models

1 code implementation2 Oct 2024 Zhenting Qi, Hongyin Luo, Xuliang Huang, Zhuokai Zhao, Yibo Jiang, Xiangjun Fan, Himabindu Lakkaraju, James Glass

Scylla disentangles generalization from memorization via assessing model performance on both in-distribution (ID) and out-of-distribution (OOD) data through 20 tasks across 5 levels of complexity.

Memorization

Learning Recourse Costs from Pairwise Feature Comparisons

1 code implementation20 Sep 2024 Kaivalya Rawal, Himabindu Lakkaraju

We demonstrate the efficient learning of individual feature costs using MAP estimates, and show that these non-exhaustive human surveys, which do not necessarily contain data for each feature pair comparison, are sufficient to learn an exhaustive set of feature costs, where each feature is associated with a modification cost.

Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers

no code implementations11 Jul 2024 Alex Oesterling, Usha Bhalla, Suresh Venkatasubramanian, Himabindu Lakkaraju

In this write-up, we address this shortcoming by providing an accessible overview of existing literature related to operationalizing regulatory principles.

Fairness

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

no code implementations15 Jun 2024 Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju

In this work, we explore the promise of three broad approaches commonly employed to steer the behavior of LLMs to enhance the faithfulness of the CoT reasoning generated by LLMs: in-context learning, fine-tuning, and activation editing.

In-Context Learning Question Answering

Interpretability Needs a New Paradigm

no code implementations8 May 2024 Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, Sarath Chandar

At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained.

More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness

1 code implementation29 Apr 2024 Aaron J. Li, Satyapriya Krishna, Himabindu Lakkaraju

The trustworthiness of Large Language Models (LLMs) refers to the extent to which their outputs are reliable, safe, and ethically aligned, and it has become a crucial consideration alongside their cognitive performance.

Ethics Language Modelling

Manipulating Large Language Models to Increase Product Visibility

1 code implementation11 Apr 2024 Aounon Kumar, Himabindu Lakkaraju

We demonstrate that adding a strategic text sequence (STS) -- a carefully crafted message -- to a product's information page can significantly increase its likelihood of being listed as the LLM's top recommendation.

STS

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

no code implementations6 Apr 2024 Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju

Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive.

Data Poisoning Off-policy evaluation

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models

1 code implementation6 Mar 2024 Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights.

Ethics General Knowledge

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

1 code implementation27 Feb 2024 Zhenting Qi, HANLIN ZHANG, Eric Xing, Sham Kakade, Himabindu Lakkaraju

Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation.

Instruction Following RAG +2

OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

no code implementations20 Feb 2024 Jiaqi Ma, Vivian Lai, Yiming Zhang, Chacha Chen, Paul Hamilton, Davor Ljubenkov, Himabindu Lakkaraju, Chenhao Tan

However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies is complex; numerous design choices in the design space of user study lead to problems of reproducibility; and running user studies can be challenging and even daunting for machine learning researchers.

Decision Making Fairness

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

1 code implementation16 Feb 2024 Usha Bhalla, Alex Oesterling, Suraj Srinivas, Flavio P. Calmon, Himabindu Lakkaraju

In this work, we show that the semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts.

Model Editing

Towards Uncovering How Large Language Model Works: An Explainability Perspective

no code implementations16 Feb 2024 Haiyan Zhao, Fan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du

Large language models (LLMs) have led to breakthroughs in language tasks, yet the internal mechanisms that enable their remarkable generalization and reasoning abilities remain opaque.

Hallucination Language Modeling +4

Understanding the Effects of Iterative Prompting on Truthfulness

no code implementations9 Feb 2024 Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities.

Text Generation

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

no code implementations7 Feb 2024 Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju

We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness.

Decision Making

A Study on the Calibration of In-context Learning

1 code implementation7 Dec 2023 HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).

In-Context Learning Natural Language Understanding +1

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

1 code implementation6 Nov 2023 Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs.

In-Context Unlearning: Language Models as Few Shot Unlearners

2 code implementations11 Oct 2023 Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju

Machine unlearning, the study of efficiently removing the impact of specific training instances on a model, has garnered increased attention in recent years due to regulatory guidelines such as the \emph{Right to be Forgotten}.

Machine Unlearning

In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

1 code implementation9 Oct 2023 Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Despite their effectiveness in enhancing the performance of LLMs on diverse language and tabular tasks, these methods have not been thoroughly explored for their potential to generate post hoc explanations.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +2

On the Trade-offs between Adversarial Robustness and Actionable Explanations

no code implementations28 Sep 2023 Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.

Adversarial Robustness

Certifying LLM Safety against Adversarial Prompting

1 code implementation6 Sep 2023 Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju

We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block.

Adversarial Attack Language Modelling

Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage

no code implementations8 Aug 2023 Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju

We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR).

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

1 code implementation NeurIPS 2023 Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju

This strategy naturally combines the ease of use of post hoc explanations with the faithfulness of inherently interpretable models.

Attribute

Characterizing Data Point Vulnerability via Average-Case Robustness

1 code implementation26 Jul 2023 Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings.

Adversarial Robustness Multi-class Classification

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

no code implementations25 Jul 2023 Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju

Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks.

Question Answering

On Minimizing the Impact of Dataset Shifts on Actionable Explanations

no code implementations11 Jun 2023 Anna P. Meyer, Dan Ley, Suraj Srinivas, Himabindu Lakkaraju

To this end, we conduct rigorous theoretical analysis to demonstrate that model curvature, weight decay parameters while training, and the magnitude of the dataset shift are key factors that determine the extent of explanation (in)stability.

Consistent Explanations in the Face of Model Indeterminacy via Ensembling

no code implementations9 Jun 2023 Dan Ley, Leonard Tang, Matthew Nazari, Hongjin Lin, Suraj Srinivas, Himabindu Lakkaraju

This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy, which arises due to the existence of multiple (nearly) equally well-performing models for a given dataset and task.

Word-Level Explanations for Analyzing Bias in Text-to-Image Models

no code implementations3 Jun 2023 Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju

We introduce a method for computing scores for each word in the prompt; these scores represent its influence on biases in the model's output.

Sentence

Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

no code implementations8 Feb 2023 Satyapriya Krishna, Jiaqi Ma, Himabindu Lakkaraju

The Right to Explanation and the Right to be Forgotten are two important principles outlined to regulate algorithmic decision making and data usage in real-world applications.

Decision Making

On the Privacy Risks of Algorithmic Recourse

1 code implementation10 Nov 2022 Martin Pawelczyk, Himabindu Lakkaraju, Seth Neel

As predictive models are increasingly being employed to make consequential decisions, there is a growing emphasis on developing techniques that can provide algorithmic recourse to affected individuals.

Towards Robust Off-Policy Evaluation via Human Inputs

no code implementations18 Sep 2022 Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju

When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes.

Multi-Armed Bandits Off-policy evaluation

Evaluating Explainability for Graph Neural Networks

1 code implementation19 Aug 2022 Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik

As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.

TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations

1 code implementation8 Jul 2022 Dylan Slack, Satyapriya Krishna, Himabindu Lakkaraju, Sameer Singh

In real-world evaluations with humans, 73% of healthcare workers (e. g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems for explainability in a disease prediction task, and 85% of ML professionals agreed TalkToModel was easier to use for computing explanations.

BIG-bench Machine Learning Disease Prediction +1

OpenXAI: Towards a Transparent Evaluation of Model Explanations

2 code implementations22 Jun 2022 Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets.

Benchmarking Explainable Artificial Intelligence (XAI) +2

Efficiently Training Low-Curvature Neural Networks

2 code implementations14 Jun 2022 Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, Francois Fleuret

To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers.

Adversarial Robustness

A Human-Centric Take on Model Monitoring

no code implementations6 Jun 2022 Murtuza N Shergadwala, Himabindu Lakkaraju, Krishnaram Kenthapadi

Predictive models are increasingly used to make various consequential decisions in high-stakes domains such as healthcare, finance, and policy.

Fairness model

Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations

1 code implementation2 Jun 2022 Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.

Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations

no code implementations15 May 2022 Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju

We then leverage these properties to propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods.

Decision Making Fairness

Rethinking Stability for Attribution-based Explanations

no code implementations14 Mar 2022 Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.

Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse

1 code implementation13 Mar 2022 Martin Pawelczyk, Teresa Datta, Johannes van-den-Heuvel, Gjergji Kasneci, Himabindu Lakkaraju

To this end, we propose a novel objective function which simultaneously minimizes the gap between the achieved (resulting) and desired recourse invalidation rates, minimizes recourse costs, and also ensures that the resulting recourse achieves a positive model prediction.

Rethinking Explainability as a Dialogue: A Practitioner's Perspective

1 code implementation3 Feb 2022 Himabindu Lakkaraju, Dylan Slack, Yuxin Chen, Chenhao Tan, Sameer Singh

Overall, we hope our work serves as a starting place for researchers and engineers to design interactive explainability systems.

BIG-bench Machine Learning

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

1 code implementation3 Feb 2022 Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju

In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements.

BIG-bench Machine Learning

What will it take to generate fairness-preserving explanations?

no code implementations24 Jun 2021 Jessica Dai, Sohini Upadhyay, Stephen H. Bach, Himabindu Lakkaraju

In situations where explanations of black-box models may be useful, the fairness of the black-box is also often a relevant concern.

Fairness

Feature Attributions and Counterfactual Explanations Can Be Manipulated

no code implementations23 Jun 2021 Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju

As machine learning models are increasingly used in critical decision-making settings (e. g., healthcare, finance), there has been a growing emphasis on developing methods to explain model predictions.

BIG-bench Machine Learning counterfactual +1

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

no code implementations18 Jun 2021 Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju

As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice.

counterfactual Counterfactual Explanation

Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods

no code implementations16 Jun 2021 Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju

As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models.

Fairness

Counterfactual Explanations Can Be Manipulated

no code implementations NeurIPS 2021 Dylan Slack, Sophie Hilgard, Himabindu Lakkaraju, Sameer Singh

In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated.

counterfactual Counterfactual Explanation +1

Learning Under Adversarial and Interventional Shifts

no code implementations29 Mar 2021 Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju

Most of the existing work focuses on optimizing for either adversarial shifts or interventional shifts.

Towards Robust and Reliable Algorithmic Recourse

no code implementations NeurIPS 2021 Sohini Upadhyay, Shalmali Joshi, Himabindu Lakkaraju

To address this problem, we propose a novel framework, RObust Algorithmic Recourse (ROAR), that leverages adversarial training for finding recourses that are robust to model shifts.

Decision Making

Towards a Unified Framework for Fair and Stable Graph Representation Learning

3 code implementations25 Feb 2021 Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik

In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations.

counterfactual Fairness +1

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

no code implementations21 Feb 2021 Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner.

Algorithmic Recourse in the Wild: Understanding the Impact of Data and Model Shifts

no code implementations22 Dec 2020 Kaivalya Rawal, Ece Kamar, Himabindu Lakkaraju

Our theoretical results establish a lower bound on the probability of recourse invalidation due to model shifts, and show the existence of a tradeoff between this invalidation probability and typical notions of "cost" minimized by modern recourse generation algorithms.

Does Fair Ranking Improve Minority Outcomes? Understanding the Interplay of Human and Algorithmic Biases in Online Hiring

no code implementations1 Dec 2020 Tom Sühr, Sophie Hilgard, Himabindu Lakkaraju

In this work, we analyze various sources of gender biases in online hiring platforms, including the job context and inherent biases of employers and establish how these factors interact with ranking algorithms to affect hiring decisions.

Robust and Stable Black Box Explanations

no code implementations12 Nov 2020 Himabindu Lakkaraju, Nino Arsov, Osbert Bastani

To the best of our knowledge, this work makes the first attempt at generating post hoc explanations that are robust to a general class of adversarial perturbations that are of practical interest.

Learning Models for Actionable Recourse

1 code implementation NeurIPS 2021 Alexis Ross, Himabindu Lakkaraju, Osbert Bastani

As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations.

counterfactual Decision Making

When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making

no code implementations12 Nov 2020 Sean McGrath, Parth Mehta, Alexandra Zytek, Isaac Lage, Himabindu Lakkaraju

As machine learning (ML) models are increasingly being employed to assist human decision makers, it becomes critical to provide these decision makers with relevant inputs which can help them decide if and how to incorporate model predictions into their decision making.

Decision Making

Incorporating Interpretable Output Constraints in Bayesian Neural Networks

1 code implementation NeurIPS 2020 Wanqian Yang, Lars Lorch, Moritz A. Graule, Himabindu Lakkaraju, Finale Doshi-Velez

Domains where supervised models are deployed often come with task-specific constraints, such as prior expert knowledge on the ground-truth function, or desiderata like safety and fairness.

Fairness Uncertainty Quantification

Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses

1 code implementation NeurIPS 2020 Kaivalya Rawal, Himabindu Lakkaraju

As predictive models are increasingly being deployed in high-stakes decision-making, there has been a lot of interest in developing algorithms which can provide recourses to affected individuals.

counterfactual Decision Making

Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

1 code implementation NeurIPS 2021 Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju

In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty.

Feature Importance

Fair Influence Maximization: A Welfare Optimization Approach

no code implementations14 Jun 2020 Aida Rahmattalabi, Shahin Jabbari, Himabindu Lakkaraju, Phebe Vayanos, Max Izenberg, Ryan Brown, Eric Rice, Milind Tambe

Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequality aversion design parameter.

Fairness Management

"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

no code implementations15 Nov 2019 Himabindu Lakkaraju, Osbert Bastani

Our work is the first to empirically establish how user trust in black box models can be manipulated via misleading explanations.

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods

2 code implementations6 Nov 2019 Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju

Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous.

Interpretable & Explorable Approximations of Black Box Models

no code implementations4 Jul 2017 Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec

To the best of our knowledge, this is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences.

Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making

no code implementations NeurIPS 2016 Himabindu Lakkaraju, Jure Leskovec

We propose Confusions over Time (CoT), a novel generative framework which facilitates a multi-granular analysis of the decision making process.

Decision Making Diagnostic

Learning Cost-Effective and Interpretable Regimes for Treatment Recommendation

no code implementations23 Nov 2016 Himabindu Lakkaraju, Cynthia Rudin

We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments.

Diagnostic

Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

no code implementations28 Oct 2016 Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz

Predictive models deployed in the real world may assign incorrect labels to instances with high confidence.

Learning Cost-Effective Treatment Regimes using Markov Decision Processes

no code implementations21 Oct 2016 Himabindu Lakkaraju, Cynthia Rudin

We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments.

Diagnostic

Cannot find the paper you are looking for? You can Submit a new open access paper.