no code implementations • 28 Sep 2023 • Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.
no code implementations • 6 Sep 2023 • Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Soheil Feizi, Hima Lakkaraju
For example, against adversarial suffixes of length 20, it certifiably detects 93% of the harmful prompts and labels 94% of the safe prompts as safe using the open source language model Llama 2 as the safety filter.
no code implementations • 25 Jul 2023 • Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal
In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome.
no code implementations • 6 May 2023 • Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian
To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories).
1 code implementation • 25 Apr 2023 • Michael Llordes, Debasis Ganguly, Sumit Bhatia, Chirag Agarwal
Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations.
no code implementations • CVPR 2023 • Ashish Seth, Mayur Hemani, Chirag Agarwal
These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications.
1 code implementation • 26 Feb 2023 • Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik
Deleted Edge Consistency ensures that the influence of deleted elements is removed from both model weights and neighboring representations, while Neighborhood Influence guarantees that the remaining model knowledge is preserved after deletion.
no code implementations • 17 Jan 2023 • Tarun Ram Menta, Surgan Jandial, Akash Patil, Vimal KB, Saketh Bachu, Balaji Krishnamurthy, Vineeth N. Balasubramanian, Chirag Agarwal, Mausoom Sarkar
As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning.
1 code implementation • 30 Nov 2022 • Valentina Giunchiglia, Chirag Varun Shukla, Guadalupe Gonzalez, Chirag Agarwal
With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions.
1 code implementation • 19 Aug 2022 • Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik
As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.
1 code implementation • 22 Jun 2022 • Chirag Agarwal, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju
OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, (ii) open-source implementations of twenty-two quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, and (iii) the first ever public XAI leaderboards to benchmark explanations.
no code implementations • 14 Mar 2022 • Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju
As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.
1 code implementation • 27 Jul 2021 • Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker
As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions.
no code implementations • 18 Jun 2021 • Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju
As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice.
no code implementations • 16 Jun 2021 • Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju
As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models.
2 code implementations • 25 Feb 2021 • Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik
In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations.
no code implementations • 21 Feb 2021 • Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner.
1 code implementation • CVPR 2022 • Chirag Agarwal, Daniel D'souza, Sara Hooker
In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing.
1 code implementation • 16 Jun 2020 • Peijie Chen, Chirag Agarwal, Anh Nguyen
Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years.
1 code implementation • CVPR 2020 • Naman Bansal, Chirag Agarwal, Anh Nguyen
Attribution methods can provide powerful insights into the reasons for a classifier's decision.
no code implementations • 3 Feb 2020 • Chirag Agarwal, Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian, Dan Schonfeld
The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications.
1 code implementation • 9 Oct 2019 • Chirag Agarwal, Anh Nguyen
Perturbation-based explanation methods often measure the contribution of an input feature to an image classifier's outputs by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.
no code implementations • 25 Sep 2019 • Chirag Agarwal, Dan Schonfeld, Anh Nguyen
Interpretability methods often measure the contribution of an input feature to an image classifier's decisions by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.
no code implementations • 1 Nov 2018 • Chirag Agarwal, Anh Nguyen, Dan Schonfeld
Intuitively, the center loss encourages DNNs to simultaneously learns a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers.
no code implementations • 5 Jun 2018 • Chirag Agarwal, Bo Dong, Dan Schonfeld, Anthony Hoogs
Instead of simply measuring a DNN's adversarial robustness in the input domain, as previous works, the proposed NSS is built on top of insightful mathematical understanding of the adversarial attack and gives a more explicit explanation of the robustness.
no code implementations • 21 May 2017 • Chirag Agarwal, Joe Klobusicky, Dan Schonfeld
We study a class of deep neural networks with networks that form a directed acyclic graph (DAG).