Search Results for author: Piotr Mardziel

Found 14 papers, 4 papers with code

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

no code implementations7 Feb 2024 Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell

Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP.

counterfactual Data Augmentation +2

Influence Patterns for Explaining Information Flow in BERT

no code implementations NeurIPS 2021 Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear.

ABSTRACTING INFLUENCE PATHS FOR EXPLAINING (CONTEXTUALIZATION OF) BERT MODELS

no code implementations28 Sep 2020 Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

While “attention is all you need” may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement(SVA) is uncertain.

Reconstructing Actions To Explain Deep Reinforcement Learning

no code implementations17 Sep 2020 Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee Joe-Wong, Anupam Datta

Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL). We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL.

Atari Games Feature Importance +2

Fairness Under Feature Exemptions: Counterfactual and Observational Measures

no code implementations14 Jun 2020 Sanghamitra Dutta, Praveen Venkatesh, Piotr Mardziel, Anupam Datta, Pulkit Grover

While quantifying disparity is essential, sometimes the needs of an occupation may require the use of certain features that are critical in a way that any disparity that can be explained by them might need to be exempted.

counterfactual Fairness

Smoothed Geometry for Robust Attribution

1 code implementation NeurIPS 2020 Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta

Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.

Interpreting Interpretations: Organizing Attribution Methods by Criteria

no code implementations19 Feb 2020 Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson

In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.

Image Classification

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks

9 code implementations3 Oct 2019 Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, Xia Hu

Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions.

Adversarial Attack Decision Making +1

Supervising Feature Influence

no code implementations28 Mar 2018 Shayak Sen, Piotr Mardziel, Anupam Datta, Matthew Fredrikson

Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints.

Active Learning

Latent Factor Interpretations for Collaborative Filtering

no code implementations29 Nov 2017 Anupam Datta, Sophia Kovaleva, Piotr Mardziel, Shayak Sen

The interpretation of latent factors can then replace the uninterpreted latent factors, resulting in a new model that expresses predictions in terms of interpretable features.

Collaborative Filtering Recommendation Systems

Proxy Non-Discrimination in Data-Driven Systems

3 code implementations25 Jul 2017 Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data.

Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

no code implementations22 May 2017 Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

For a specific instantiation of this definition, we present a program analysis technique that detects instances of proxy use in a model, and provides a witness that identifies which parts of the corresponding program exhibit the behavior.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.