no code implementations • 7 Feb 2024 • Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell
Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP.
no code implementations • NeurIPS 2021 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta
While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear.
no code implementations • 28 Sep 2020 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta
While “attention is all you need” may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement(SVA) is uncertain.
no code implementations • 17 Sep 2020 • Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee Joe-Wong, Anupam Datta
Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL). We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL.
no code implementations • 14 Jun 2020 • Sanghamitra Dutta, Praveen Venkatesh, Piotr Mardziel, Anupam Datta, Pulkit Grover
While quantifying disparity is essential, sometimes the needs of an occupation may require the use of certain features that are critical in a way that any disparity that can be explained by them might need to be exempted.
1 code implementation • NeurIPS 2020 • Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta
Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.
no code implementations • ACL 2020 • Kaiji Lu, Piotr Mardziel, Klas Leino, Matt Fedrikson, Anupam Datta
LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks.
no code implementations • 19 Feb 2020 • Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson
In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.
9 code implementations • 3 Oct 2019 • Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, Xia Hu
Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions.
1 code implementation • 31 Jul 2018 • Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Amancharla, Anupam Datta
We define a general benchmark to quantify gender bias in a variety of neural NLP tasks.
no code implementations • 28 Mar 2018 • Shayak Sen, Piotr Mardziel, Anupam Datta, Matthew Fredrikson
Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints.
no code implementations • 29 Nov 2017 • Anupam Datta, Sophia Kovaleva, Piotr Mardziel, Shayak Sen
The interpretation of latent factors can then replace the uninterpreted latent factors, resulting in a new model that expresses predictions in terms of interpretable features.
3 code implementations • 25 Jul 2017 • Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen
Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data.
no code implementations • 22 May 2017 • Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen
For a specific instantiation of this definition, we present a program analysis technique that detects instances of proxy use in a model, and provides a witness that identifies which parts of the corresponding program exhibit the behavior.