no code implementations • 22 Nov 2024 • Elita Lobo, Chirag Agarwal, Himabindu Lakkaraju
Large language models have emerged as powerful tools for general intelligence, showcasing advanced natural language processing capabilities that find applications across diverse domains.
no code implementations • 13 Nov 2024 • Abhinav Java, Simra Shahid, Chirag Agarwal
The widespread practice of indiscriminate data scraping to fine-tune language models (LMs) raises significant legal and ethical concerns, particularly regarding compliance with data protection laws such as the General Data Protection Regulation (GDPR).
no code implementations • 15 Jun 2024 • Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju
In this work, we explore the promise of three broad approaches commonly employed to steer the behavior of LLMs to enhance the faithfulness of the CoT reasoning generated by LLMs: in-context learning, fine-tuning, and activation editing.
1 code implementation • 6 Mar 2024 • Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights.
no code implementations • 9 Feb 2024 • Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities.
no code implementations • 7 Feb 2024 • Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju
We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness.
1 code implementation • 6 Nov 2023 • Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju
In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs.
1 code implementation • 9 Oct 2023 • Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
Despite their effectiveness in enhancing the performance of LLMs on diverse language and tabular tasks, these methods have not been thoroughly explored for their potential to generate post hoc explanations.
Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +2
no code implementations • 28 Sep 2023 • Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.
1 code implementation • 6 Sep 2023 • Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju
We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block.
no code implementations • 25 Jul 2023 • Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal
In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome.
2 code implementations • 6 May 2023 • Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian
To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories).
1 code implementation • 25 Apr 2023 • Michael Llordes, Debasis Ganguly, Sumit Bhatia, Chirag Agarwal
Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations.
no code implementations • CVPR 2023 • Ashish Seth, Mayur Hemani, Chirag Agarwal
These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications.
1 code implementation • 26 Feb 2023 • Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik
Deleted Edge Consistency ensures that the influence of deleted elements is removed from both model weights and neighboring representations, while Neighborhood Influence guarantees that the remaining model knowledge is preserved after deletion.
no code implementations • 17 Jan 2023 • Tarun Ram Menta, Surgan Jandial, Akash Patil, Vimal KB, Saketh Bachu, Balaji Krishnamurthy, Vineeth N. Balasubramanian, Chirag Agarwal, Mausoom Sarkar
As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning.
1 code implementation • 30 Nov 2022 • Valentina Giunchiglia, Chirag Varun Shukla, Guadalupe Gonzalez, Chirag Agarwal
With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions.
1 code implementation • 19 Aug 2022 • Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik
As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.
2 code implementations • 22 Jun 2022 • Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju
OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets.
no code implementations • 14 Mar 2022 • Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju
As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.
1 code implementation • 27 Jul 2021 • Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker
As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions.
no code implementations • 18 Jun 2021 • Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju
As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice.
no code implementations • 16 Jun 2021 • Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju
As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models.
3 code implementations • 25 Feb 2021 • Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik
In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations.
no code implementations • 21 Feb 2021 • Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner.
1 code implementation • CVPR 2022 • Chirag Agarwal, Daniel D'souza, Sara Hooker
In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing.
1 code implementation • 16 Jun 2020 • Peijie Chen, Chirag Agarwal, Anh Nguyen
Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years.
1 code implementation • CVPR 2020 • Naman Bansal, Chirag Agarwal, Anh Nguyen
Attribution methods can provide powerful insights into the reasons for a classifier's decision.
no code implementations • 3 Feb 2020 • Chirag Agarwal, Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian, Dan Schonfeld
The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications.
1 code implementation • 9 Oct 2019 • Chirag Agarwal, Anh Nguyen
Perturbation-based explanation methods often measure the contribution of an input feature to an image classifier's outputs by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.
no code implementations • 25 Sep 2019 • Chirag Agarwal, Dan Schonfeld, Anh Nguyen
Interpretability methods often measure the contribution of an input feature to an image classifier's decisions by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.
no code implementations • 1 Nov 2018 • Chirag Agarwal, Anh Nguyen, Dan Schonfeld
Intuitively, the center loss encourages DNNs to simultaneously learns a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers.
no code implementations • 5 Jun 2018 • Chirag Agarwal, Bo Dong, Dan Schonfeld, Anthony Hoogs
Instead of simply measuring a DNN's adversarial robustness in the input domain, as previous works, the proposed NSS is built on top of insightful mathematical understanding of the adversarial attack and gives a more explicit explanation of the robustness.
no code implementations • 21 May 2017 • Chirag Agarwal, Joe Klobusicky, Dan Schonfeld
We study a class of deep neural networks with networks that form a directed acyclic graph (DAG).