Search Results for author: Anupam Datta

Found 25 papers, 6 papers with code

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

no code implementations7 Feb 2024 Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell

Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP.

counterfactual Data Augmentation +2

Is Certifying $\ell_p$ Robustness Still Worthwhile?

no code implementations13 Oct 2023 Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson

There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?

Order-sensitive Shapley Values for Evaluating Conceptual Soundness of NLP Models

no code implementations1 Jun 2022 Kaiji Lu, Anupam Datta

Previous works show that deep NLP models are not always conceptually sound: they do not always learn the correct linguistic concepts.

Data Augmentation Negation +1

Faithful Explanations for Deep Graph Models

no code implementations24 May 2022 Zifan Wang, Yuhang Yao, Chaoran Zhang, Han Zhang, Youjie Kang, Carlee Joe-Wong, Matt Fredrikson, Anupam Datta

Second, our analytical and empirical results demonstrate that feature attribution methods cannot capture the nonlinear effect of edge features, while existing subgraph explanation methods are not faithful.

Anomaly Detection

Consistent Counterfactuals for Deep Models

no code implementations ICLR 2022 Emily Black, Zifan Wang, Matt Fredrikson, Anupam Datta

Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis.

counterfactual Medical Diagnosis

Robust Models Are More Interpretable Because Attributions Look Normal

1 code implementation20 Mar 2021 Zifan Wang, Matt Fredrikson, Anupam Datta

Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class.

Image Classification

Influence Patterns for Explaining Information Flow in BERT

no code implementations NeurIPS 2021 Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear.

ABSTRACTING INFLUENCE PATHS FOR EXPLAINING (CONTEXTUALIZATION OF) BERT MODELS

no code implementations28 Sep 2020 Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

While “attention is all you need” may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement(SVA) is uncertain.

Reconstructing Actions To Explain Deep Reinforcement Learning

no code implementations17 Sep 2020 Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee Joe-Wong, Anupam Datta

Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL). We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL.

Atari Games Feature Importance +2

Fairness Under Feature Exemptions: Counterfactual and Observational Measures

no code implementations14 Jun 2020 Sanghamitra Dutta, Praveen Venkatesh, Piotr Mardziel, Anupam Datta, Pulkit Grover

While quantifying disparity is essential, sometimes the needs of an occupation may require the use of certain features that are critical in a way that any disparity that can be explained by them might need to be exempted.

counterfactual Fairness

Smoothed Geometry for Robust Attribution

1 code implementation NeurIPS 2020 Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta

Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.

Interpreting Interpretations: Organizing Attribution Methods by Criteria

no code implementations19 Feb 2020 Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson

In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.

Image Classification

Feature-Wise Bias Amplification

no code implementations ICLR 2019 Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta

This overestimation gives rise to feature-wise bias amplification -- a previously unreported form of bias that can be traced back to the features of a trained model.

feature selection Inductive Bias

Hunting for Discriminatory Proxies in Linear Regression Models

1 code implementation NeurIPS 2018 Samuel Yeom, Anupam Datta, Matt Fredrikson

In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies.

Attribute regression

Supervising Feature Influence

no code implementations28 Mar 2018 Shayak Sen, Piotr Mardziel, Anupam Datta, Matthew Fredrikson

Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints.

Active Learning

Influence-Directed Explanations for Deep Convolutional Networks

2 code implementations ICLR 2018 Klas Leino, Shayak Sen, Anupam Datta, Matt Fredrikson, Linyi Li

We study the problem of explaining a rich class of behavioral properties of deep neural networks.

Latent Factor Interpretations for Collaborative Filtering

no code implementations29 Nov 2017 Anupam Datta, Sophia Kovaleva, Piotr Mardziel, Shayak Sen

The interpretation of latent factors can then replace the uninterpreted latent factors, resulting in a new model that expresses predictions in terms of interpretable features.

Collaborative Filtering Recommendation Systems

Case Study: Explaining Diabetic Retinopathy Detection Deep CNNs via Integrated Gradients

no code implementations27 Sep 2017 Linyi Li, Matt Fredrikson, Shayak Sen, Anupam Datta

In this report, we applied integrated gradients to explaining a neural network for diabetic retinopathy detection.

Diabetic Retinopathy Detection

Proxy Non-Discrimination in Data-Driven Systems

3 code implementations25 Jul 2017 Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data.

Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

no code implementations22 May 2017 Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

For a specific instantiation of this definition, we present a program analysis technique that detects instances of proxy use in a model, and provides a witness that identifies which parts of the corresponding program exhibit the behavior.

General Classification

GOTCHA Password Hackers!

no code implementations4 Oct 2013 Jeremiah Blocki, Manuel Blum, Anupam Datta

(2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA.

Differentially Private Data Analysis of Social Networks via Restricted Sensitivity

no code implementations22 Aug 2012 Jeremiah Blocki, Avrim Blum, Anupam Datta, Or Sheffet

Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f. Moreover, if the belief of the querier is correct (i. e., D is in H) then f_H(D) = f(D).

Cryptography and Security Social and Information Networks Physics and Society

Cannot find the paper you are looking for? You can Submit a new open access paper.