Search Results for author: Anupam Datta

Found 25 papers, 6 papers with code

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

no code implementations • 7 Feb 2024 • Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell

Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP.

counterfactual Data Augmentation +2

Paper
Add Code

Is Certifying $\ell_p$ Robustness Still Worthwhile?

no code implementations • 13 Oct 2023 • Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson

There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?

Paper
Add Code

Identifying and Mitigating the Security Risks of Generative AI

no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.

Code Completion In-Context Learning +1

Paper
Add Code

Order-sensitive Shapley Values for Evaluating Conceptual Soundness of NLP Models

no code implementations • 1 Jun 2022 • Kaiji Lu, Anupam Datta

Previous works show that deep NLP models are not always conceptually sound: they do not always learn the correct linguistic concepts.

Data Augmentation Negation +1

Paper
Add Code

Faithful Explanations for Deep Graph Models

no code implementations • 24 May 2022 • Zifan Wang, Yuhang Yao, Chaoran Zhang, Han Zhang, Youjie Kang, Carlee Joe-Wong, Matt Fredrikson, Anupam Datta

Second, our analytical and empirical results demonstrate that feature attribution methods cannot capture the nonlinear effect of edge features, while existing subgraph explanation methods are not faithful.

Anomaly Detection

Paper
Add Code

Consistent Counterfactuals for Deep Models

no code implementations • ICLR 2022 • Emily Black, Zifan Wang, Matt Fredrikson, Anupam Datta

Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis.

counterfactual Medical Diagnosis

Paper
Add Code

Robust Models Are More Interpretable Because Attributions Look Normal

1 code implementation • 20 Mar 2021 • Zifan Wang, Matt Fredrikson, Anupam Datta

Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class.

Image Classification

Paper
Code

Influence Patterns for Explaining Information Flow in BERT

no code implementations • NeurIPS 2021 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear.

Paper
Add Code

ABSTRACTING INFLUENCE PATHS FOR EXPLAINING (CONTEXTUALIZATION OF) BERT MODELS

no code implementations • 28 Sep 2020 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

While “attention is all you need” may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement(SVA) is uncertain.

Paper
Add Code

Reconstructing Actions To Explain Deep Reinforcement Learning

no code implementations • 17 Sep 2020 • Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee Joe-Wong, Anupam Datta

Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL). We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL.

Atari Games Feature Importance +2

Paper
Add Code

Fairness Under Feature Exemptions: Counterfactual and Observational Measures

no code implementations • 14 Jun 2020 • Sanghamitra Dutta, Praveen Venkatesh, Piotr Mardziel, Anupam Datta, Pulkit Grover

While quantifying disparity is essential, sometimes the needs of an occupation may require the use of certain features that are critical in a way that any disparity that can be explained by them might need to be exempted.

counterfactual Fairness

Paper
Add Code

Smoothed Geometry for Robust Attribution

1 code implementation • NeurIPS 2020 • Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta

Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.

Paper
Code

Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models

no code implementations • ACL 2020 • Kaiji Lu, Piotr Mardziel, Klas Leino, Matt Fedrikson, Anupam Datta

LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks.

Language Modelling

Paper
Add Code

Interpreting Interpretations: Organizing Attribution Methods by Criteria

no code implementations • 19 Feb 2020 • Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson

In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.

Image Classification

Paper
Add Code

Feature-Wise Bias Amplification

no code implementations • ICLR 2019 • Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta

This overestimation gives rise to feature-wise bias amplification -- a previously unreported form of bias that can be traced back to the features of a trained model.

feature selection Inductive Bias

Paper
Add Code

Hunting for Discriminatory Proxies in Linear Regression Models

1 code implementation • NeurIPS 2018 • Samuel Yeom, Anupam Datta, Matt Fredrikson

In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies.

Attribute regression

Paper
Code

Gender Bias in Neural Natural Language Processing

1 code implementation • 31 Jul 2018 • Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Amancharla, Anupam Datta

We define a general benchmark to quantify gender bias in a variety of neural NLP tasks.

coreference-resolution Word Embeddings

Paper
Code

Supervising Feature Influence

no code implementations • 28 Mar 2018 • Shayak Sen, Piotr Mardziel, Anupam Datta, Matthew Fredrikson

Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints.

Active Learning

Paper
Add Code

Influence-Directed Explanations for Deep Convolutional Networks

2 code implementations • ICLR 2018 • Klas Leino, Shayak Sen, Anupam Datta, Matt Fredrikson, Linyi Li

We study the problem of explaining a rich class of behavioral properties of deep neural networks.

4,556

Paper
Code

Latent Factor Interpretations for Collaborative Filtering

no code implementations • 29 Nov 2017 • Anupam Datta, Sophia Kovaleva, Piotr Mardziel, Shayak Sen

The interpretation of latent factors can then replace the uninterpreted latent factors, resulting in a new model that expresses predictions in terms of interpretable features.

Collaborative Filtering Recommendation Systems

Paper
Add Code

Case Study: Explaining Diabetic Retinopathy Detection Deep CNNs via Integrated Gradients

no code implementations • 27 Sep 2017 • Linyi Li, Matt Fredrikson, Shayak Sen, Anupam Datta

In this report, we applied integrated gradients to explaining a neural network for diabetic retinopathy detection.

Diabetic Retinopathy Detection

Paper
Add Code

Proxy Non-Discrimination in Data-Driven Systems

3 code implementations • 25 Jul 2017 • Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data.

Paper
Code

Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

no code implementations • 22 May 2017 • Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

For a specific instantiation of this definition, we present a program analysis technique that detects instances of proxy use in a model, and provides a witness that identifies which parts of the corresponding program exhibit the behavior.

General Classification

Paper
Add Code

GOTCHA Password Hackers!

no code implementations • 4 Oct 2013 • Jeremiah Blocki, Manuel Blum, Anupam Datta

(2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA.

Paper
Add Code

Differentially Private Data Analysis of Social Networks via Restricted Sensitivity

no code implementations • 22 Aug 2012 • Jeremiah Blocki, Avrim Blum, Anupam Datta, Or Sheffet

Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f. Moreover, if the belief of the querier is correct (i. e., D is in H) then f_H(D) = f(D).

Cryptography and Security Social and Information Networks Physics and Society

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.