Search Results for author: Chirag Agarwal

Found 34 papers, 16 papers with code

On the Impact of Fine-Tuning on Chain-of-Thought Reasoning

no code implementations22 Nov 2024 Elita Lobo, Chirag Agarwal, Himabindu Lakkaraju

Large language models have emerged as powerful tools for general intelligence, showcasing advanced natural language processing capabilities that find applications across diverse domains.

Towards Operationalizing Right to Data Protection

no code implementations13 Nov 2024 Abhinav Java, Simra Shahid, Chirag Agarwal

The widespread practice of indiscriminate data scraping to fine-tune language models (LMs) raises significant legal and ethical concerns, particularly regarding compliance with data protection laws such as the General Data Protection Regulation (GDPR).

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

no code implementations15 Jun 2024 Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju

In this work, we explore the promise of three broad approaches commonly employed to steer the behavior of LLMs to enhance the faithfulness of the CoT reasoning generated by LLMs: in-context learning, fine-tuning, and activation editing.

In-Context Learning Question Answering

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models

1 code implementation6 Mar 2024 Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights.

Ethics General Knowledge

Understanding the Effects of Iterative Prompting on Truthfulness

no code implementations9 Feb 2024 Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities.

Text Generation

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

no code implementations7 Feb 2024 Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju

We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness.

Decision Making

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

1 code implementation6 Nov 2023 Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs.

In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

1 code implementation9 Oct 2023 Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Despite their effectiveness in enhancing the performance of LLMs on diverse language and tabular tasks, these methods have not been thoroughly explored for their potential to generate post hoc explanations.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +2

On the Trade-offs between Adversarial Robustness and Actionable Explanations

no code implementations28 Sep 2023 Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.

Adversarial Robustness

Certifying LLM Safety against Adversarial Prompting

1 code implementation6 Sep 2023 Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju

We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block.

Adversarial Attack Language Modelling

Counterfactual Explanation Policies in RL

no code implementations25 Jul 2023 Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal

In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome.

counterfactual Counterfactual Explanation +2

Explaining RL Decisions with Trajectories

2 code implementations6 May 2023 Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian

To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories).

Attribute continuous-control +4

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

1 code implementation25 Apr 2023 Michael Llordes, Debasis Ganguly, Sumit Bhatia, Chirag Agarwal

Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations.

Retrieval

DeAR: Debiasing Vision-Language Models with Additive Residuals

no code implementations CVPR 2023 Ashish Seth, Mayur Hemani, Chirag Agarwal

These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications.

Attribute Benchmarking +2

GNNDelete: A General Strategy for Unlearning in Graph Neural Networks

1 code implementation26 Feb 2023 Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik

Deleted Edge Consistency ensures that the influence of deleted elements is removed from both model weights and neighboring representations, while Neighborhood Influence guarantees that the remaining model knowledge is preserved after deletion.

Graph Neural Network

Towards Estimating Transferability using Hard Subsets

no code implementations17 Jan 2023 Tarun Ram Menta, Surgan Jandial, Akash Patil, Vimal KB, Saketh Bachu, Balaji Krishnamurthy, Vineeth N. Balasubramanian, Chirag Agarwal, Mausoom Sarkar

As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning.

Transfer Learning

Towards Training GNNs using Explanation Directed Message Passing

1 code implementation30 Nov 2022 Valentina Giunchiglia, Chirag Varun Shukla, Guadalupe Gonzalez, Chirag Agarwal

With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions.

Evaluating Explainability for Graph Neural Networks

1 code implementation19 Aug 2022 Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik

As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.

OpenXAI: Towards a Transparent Evaluation of Model Explanations

2 code implementations22 Jun 2022 Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets.

Benchmarking Explainable Artificial Intelligence (XAI) +1

Rethinking Stability for Attribution-based Explanations

no code implementations14 Mar 2022 Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.

A Tale Of Two Long Tails

1 code implementation27 Jul 2021 Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker

As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions.

Data Augmentation Vocal Bursts Valence Prediction

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

no code implementations18 Jun 2021 Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju

As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice.

counterfactual Counterfactual Explanation

Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods

no code implementations16 Jun 2021 Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju

As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models.

Fairness

Towards a Unified Framework for Fair and Stable Graph Representation Learning

3 code implementations25 Feb 2021 Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik

In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations.

counterfactual Fairness +1

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

no code implementations21 Feb 2021 Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner.

Estimating Example Difficulty Using Variance of Gradients

1 code implementation CVPR 2022 Chirag Agarwal, Daniel D'souza, Sara Hooker

In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing.

Out-of-Distribution Detection

The shape and simplicity biases of adversarially robust ImageNet-trained CNNs

1 code implementation16 Jun 2020 Peijie Chen, Chirag Agarwal, Anh Nguyen

Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years.

Image Generation

SAM: The Sensitivity of Attribution Methods to Hyperparameters

1 code implementation CVPR 2020 Naman Bansal, Chirag Agarwal, Anh Nguyen

Attribution methods can provide powerful insights into the reasons for a classifier's decision.

Deep-URL: A Model-Aware Approach To Blind Deconvolution Based On Deep Unfolded Richardson-Lucy Network

no code implementations3 Feb 2020 Chirag Agarwal, Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian, Dan Schonfeld

The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications.

Deep Learning

Explaining image classifiers by removing input features using generative models

1 code implementation9 Oct 2019 Chirag Agarwal, Anh Nguyen

Perturbation-based explanation methods often measure the contribution of an input feature to an image classifier's outputs by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.

counterfactual Object Localization

Removing input features via a generative model to explain their attributions to classifier's decisions

no code implementations25 Sep 2019 Chirag Agarwal, Dan Schonfeld, Anh Nguyen

Interpretability methods often measure the contribution of an input feature to an image classifier's decisions by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.

counterfactual

Improving Adversarial Robustness by Encouraging Discriminative Features

no code implementations1 Nov 2018 Chirag Agarwal, Anh Nguyen, Dan Schonfeld

Intuitively, the center loss encourages DNNs to simultaneously learns a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers.

Adversarial Robustness

An Explainable Adversarial Robustness Metric for Deep Learning Neural Networks

no code implementations5 Jun 2018 Chirag Agarwal, Bo Dong, Dan Schonfeld, Anthony Hoogs

Instead of simply measuring a DNN's adversarial robustness in the input domain, as previous works, the proposed NSS is built on top of insightful mathematical understanding of the adversarial attack and gives a more explicit explanation of the robustness.

Adversarial Attack Adversarial Robustness +4

Convergence of backpropagation with momentum for network architectures with skip connections

no code implementations21 May 2017 Chirag Agarwal, Joe Klobusicky, Dan Schonfeld

We study a class of deep neural networks with networks that form a directed acyclic graph (DAG).

Cannot find the paper you are looking for? You can Submit a new open access paper.