Search Results for author: Chirag Agarwal

Found 26 papers, 11 papers with code

On the Trade-offs between Adversarial Robustness and Actionable Explanations

no code implementations28 Sep 2023 Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.

Adversarial Robustness

Certifying LLM Safety against Adversarial Prompting

no code implementations6 Sep 2023 Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Soheil Feizi, Hima Lakkaraju

For example, against adversarial suffixes of length 20, it certifiably detects 93% of the harmful prompts and labels 94% of the safe prompts as safe using the open source language model Llama 2 as the safety filter.

Language Modelling

Counterfactual Explanation Policies in RL

no code implementations25 Jul 2023 Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal

In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome.

Counterfactual Explanation Decision Making +1

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

1 code implementation25 Apr 2023 Michael Llordes, Debasis Ganguly, Sumit Bhatia, Chirag Agarwal

Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations.

Retrieval

DeAR: Debiasing Vision-Language Models with Additive Residuals

no code implementations CVPR 2023 Ashish Seth, Mayur Hemani, Chirag Agarwal

These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications.

Benchmarking Fairness +1

GNNDelete: A General Strategy for Unlearning in Graph Neural Networks

1 code implementation26 Feb 2023 Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik

Deleted Edge Consistency ensures that the influence of deleted elements is removed from both model weights and neighboring representations, while Neighborhood Influence guarantees that the remaining model knowledge is preserved after deletion.

Towards Estimating Transferability using Hard Subsets

no code implementations17 Jan 2023 Tarun Ram Menta, Surgan Jandial, Akash Patil, Vimal KB, Saketh Bachu, Balaji Krishnamurthy, Vineeth N. Balasubramanian, Chirag Agarwal, Mausoom Sarkar

As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning.

Transfer Learning

Towards Training GNNs using Explanation Directed Message Passing

1 code implementation30 Nov 2022 Valentina Giunchiglia, Chirag Varun Shukla, Guadalupe Gonzalez, Chirag Agarwal

With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions.

Evaluating Explainability for Graph Neural Networks

1 code implementation19 Aug 2022 Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik

As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.

OpenXAI: Towards a Transparent Evaluation of Model Explanations

1 code implementation22 Jun 2022 Chirag Agarwal, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, (ii) open-source implementations of twenty-two quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, and (iii) the first ever public XAI leaderboards to benchmark explanations.

Benchmarking Explainable Artificial Intelligence (XAI) +1

Rethinking Stability for Attribution-based Explanations

no code implementations14 Mar 2022 Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.

A Tale Of Two Long Tails

1 code implementation27 Jul 2021 Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker

As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions.

Data Augmentation Vocal Bursts Valence Prediction

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

no code implementations18 Jun 2021 Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju

As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice.

Counterfactual Explanation

Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods

no code implementations16 Jun 2021 Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju

As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models.

Fairness

Towards a Unified Framework for Fair and Stable Graph Representation Learning

2 code implementations25 Feb 2021 Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik

In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations.

Fairness Graph Representation Learning

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

no code implementations21 Feb 2021 Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner.

Estimating Example Difficulty Using Variance of Gradients

1 code implementation CVPR 2022 Chirag Agarwal, Daniel D'souza, Sara Hooker

In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing.

Out-of-Distribution Detection

The shape and simplicity biases of adversarially robust ImageNet-trained CNNs

1 code implementation16 Jun 2020 Peijie Chen, Chirag Agarwal, Anh Nguyen

Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years.

Image Generation

SAM: The Sensitivity of Attribution Methods to Hyperparameters

1 code implementation CVPR 2020 Naman Bansal, Chirag Agarwal, Anh Nguyen

Attribution methods can provide powerful insights into the reasons for a classifier's decision.

Deep-URL: A Model-Aware Approach To Blind Deconvolution Based On Deep Unfolded Richardson-Lucy Network

no code implementations3 Feb 2020 Chirag Agarwal, Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian, Dan Schonfeld

The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications.

Explaining image classifiers by removing input features using generative models

1 code implementation9 Oct 2019 Chirag Agarwal, Anh Nguyen

Perturbation-based explanation methods often measure the contribution of an input feature to an image classifier's outputs by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.

Object Localization

Removing input features via a generative model to explain their attributions to classifier's decisions

no code implementations25 Sep 2019 Chirag Agarwal, Dan Schonfeld, Anh Nguyen

Interpretability methods often measure the contribution of an input feature to an image classifier's decisions by heuristically removing it via e. g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples.

Improving Adversarial Robustness by Encouraging Discriminative Features

no code implementations1 Nov 2018 Chirag Agarwal, Anh Nguyen, Dan Schonfeld

Intuitively, the center loss encourages DNNs to simultaneously learns a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers.

Adversarial Robustness

An Explainable Adversarial Robustness Metric for Deep Learning Neural Networks

no code implementations5 Jun 2018 Chirag Agarwal, Bo Dong, Dan Schonfeld, Anthony Hoogs

Instead of simply measuring a DNN's adversarial robustness in the input domain, as previous works, the proposed NSS is built on top of insightful mathematical understanding of the adversarial attack and gives a more explicit explanation of the robustness.

Adversarial Attack Adversarial Robustness +3

Convergence of backpropagation with momentum for network architectures with skip connections

no code implementations21 May 2017 Chirag Agarwal, Joe Klobusicky, Dan Schonfeld

We study a class of deep neural networks with networks that form a directed acyclic graph (DAG).

Cannot find the paper you are looking for? You can Submit a new open access paper.