Search Results for author: Matt Fredrikson

Found 32 papers, 17 papers with code

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

no code implementations22 Nov 2023 Chi Zhang, Zifan Wang, Ravi Mangal, Matt Fredrikson, Limin Jia, Corina Pasareanu

They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities.

Code Summarization

Is Certifying $\ell_p$ Robustness Still Worthwhile?

no code implementations13 Oct 2023 Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson

There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?

A Recipe for Improved Certifiable Robustness: Capacity and Data

1 code implementation4 Oct 2023 Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson

A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training.

Data Augmentation

Representation Engineering: A Top-Down Approach to AI Transparency

1 code implementation2 Oct 2023 Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience.

Question Answering

Universal and Transferable Adversarial Attacks on Aligned Language Models

11 code implementations27 Jul 2023 Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson

Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer).

Adversarial Attack

Unlocking Deterministic Robustness Certification on ImageNet

2 code implementations NeurIPS 2023 Kai Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson

We show that fast ways of bounding the Lipschitz constant for conventional ResNets are loose, and show how to address this by designing a new residual block, leading to the \emph{Linear ResNet} (LiResNet) architecture.

Learning Modulo Theories

no code implementations26 Jan 2023 Matt Fredrikson, Kaiji Lu, Saranya Vijayakumar, Somesh Jha, Vijay Ganesh, Zifan Wang

Recent techniques that integrate \emph{solver layers} into Deep Neural Networks (DNNs) have shown promise in bridging a long-standing gap between inductive learning and symbolic reasoning techniques.

Black-Box Audits for Group Distribution Shifts

no code implementations8 Sep 2022 Marc Juarez, Samuel Yeom, Matt Fredrikson

Our experimental results on real-world datasets show that this approach is effective, achieving 80--100% AUC-ROC in detecting shifts involving the underrepresentation of a demographic group in the training set.

On the Perils of Cascading Robust Classifiers

1 code implementation1 Jun 2022 Ravi Mangal, Zifan Wang, Chi Zhang, Klas Leino, Corina Pasareanu, Matt Fredrikson

We present \emph{cascade attack} (CasA), an adversarial attack against cascading ensembles, and show that: (1) there exists an adversarial input for up to 88\% of the samples where the ensemble claims to be certifiably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11\% when it claims to be certifiably robust and accurate on 97\% of the test set.

Adversarial Attack

Faithful Explanations for Deep Graph Models

no code implementations24 May 2022 Zifan Wang, Yuhang Yao, Chaoran Zhang, Han Zhang, Youjie Kang, Carlee Joe-Wong, Matt Fredrikson, Anupam Datta

Second, our analytical and empirical results demonstrate that feature attribution methods cannot capture the nonlinear effect of edge features, while existing subgraph explanation methods are not faithful.

Anomaly Detection

Selective Ensembles for Consistent Predictions

no code implementations ICLR 2022 Emily Black, Klas Leino, Matt Fredrikson

Recent work has shown that models trained to the same objective, and which achieve similar measures of accuracy on consistent test data, may nonetheless behave very differently on individual predictions.

Medical Diagnosis

Consistent Counterfactuals for Deep Models

no code implementations ICLR 2022 Emily Black, Zifan Wang, Matt Fredrikson, Anupam Datta

Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis.

counterfactual Medical Diagnosis

Degradation Attacks on Certifiably Robust Neural Networks

no code implementations29 Sep 2021 Klas Leino, Chi Zhang, Ravi Mangal, Matt Fredrikson, Bryan Parno, Corina Pasareanu

Certifiably robust neural networks employ provable run-time defenses against adversarial examples by checking if the model is locally robust at the input under evaluation.

valid

Self-Correcting Neural Networks For Safe Classification

1 code implementation23 Jul 2021 Klas Leino, Aymeric Fromherz, Ravi Mangal, Matt Fredrikson, Bryan Parno, Corina Păsăreanu

These constraints relate requirements on the order of the classes output by a classifier to conditions on its input, and are expressive enough to encode various interesting examples of classifier safety specifications from the literature.

Classification

Leave-one-out Unfairness

no code implementations21 Jul 2021 Emily Black, Matt Fredrikson

We introduce leave-one-out unfairness, which characterizes how likely a model's prediction for an individual will change due to the inclusion or removal of a single other person in the model's training data.

Fairness Memorization

Relaxing Local Robustness

1 code implementation NeurIPS 2021 Klas Leino, Matt Fredrikson

Certifiable local robustness, which rigorously precludes small-norm adversarial examples, has received significant attention as a means of addressing security concerns in deep learning.

Robust Models Are More Interpretable Because Attributions Look Normal

1 code implementation20 Mar 2021 Zifan Wang, Matt Fredrikson, Anupam Datta

Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class.

Image Classification

Globally-Robust Neural Networks

2 code implementations16 Feb 2021 Klas Leino, Zifan Wang, Matt Fredrikson

We show that widely-used architectures can be easily adapted to this objective by incorporating efficient global Lipschitz bounds into the network, yielding certifiably-robust models by construction that achieve state-of-the-art verifiable accuracy.

Smoothed Geometry for Robust Attribution

1 code implementation NeurIPS 2020 Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta

Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.

Interpreting Interpretations: Organizing Attribution Methods by Criteria

no code implementations19 Feb 2020 Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson

In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.

Image Classification

Individual Fairness Revisited: Transferring Techniques from Adversarial Robustness

no code implementations18 Feb 2020 Samuel Yeom, Matt Fredrikson

We turn the definition of individual fairness on its head---rather than ascertaining the fairness of a model given a predetermined metric, we find a metric for a given model that satisfies individual fairness.

Adversarial Robustness Fairness

Fast Geometric Projections for Local Robustness Certification

no code implementations ICLR 2021 Aymeric Fromherz, Klas Leino, Matt Fredrikson, Bryan Parno, Corina Păsăreanu

Local robustness ensures that a model classifies all inputs within an $\ell_2$-ball consistently, which precludes various forms of adversarial inputs.

Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference

no code implementations27 Jun 2019 Klas Leino, Matt Fredrikson

Membership inference (MI) attacks exploit the fact that machine learning algorithms sometimes leak information about their training data through the learned model.

Memorization

Learning Fair Representations for Kernel Models

2 code implementations27 Jun 2019 Zilong Tan, Samuel Yeom, Matt Fredrikson, Ameet Talwalkar

In contrast, we demonstrate the promise of learning a model-aware fair representation, focusing on kernel-based models.

Dimensionality Reduction Fairness

FlipTest: Fairness Testing via Optimal Transport

1 code implementation21 Jun 2019 Emily Black, Samuel Yeom, Matt Fredrikson

We present FlipTest, a black-box technique for uncovering discrimination in classifiers.

Fairness Translation

Feature-Wise Bias Amplification

no code implementations ICLR 2019 Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta

This overestimation gives rise to feature-wise bias amplification -- a previously unreported form of bias that can be traced back to the features of a trained model.

feature selection Inductive Bias

Hunting for Discriminatory Proxies in Linear Regression Models

1 code implementation NeurIPS 2018 Samuel Yeom, Anupam Datta, Matt Fredrikson

In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies.

Attribute regression

Influence-Directed Explanations for Deep Convolutional Networks

2 code implementations ICLR 2018 Klas Leino, Shayak Sen, Anupam Datta, Matt Fredrikson, Linyi Li

We study the problem of explaining a rich class of behavioral properties of deep neural networks.

Case Study: Explaining Diabetic Retinopathy Detection Deep CNNs via Integrated Gradients

no code implementations27 Sep 2017 Linyi Li, Matt Fredrikson, Shayak Sen, Anupam Datta

In this report, we applied integrated gradients to explaining a neural network for diabetic retinopathy detection.

Diabetic Retinopathy Detection

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

1 code implementation5 Sep 2017 Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha

This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks.

Attribute BIG-bench Machine Learning

Proxy Non-Discrimination in Data-Driven Systems

3 code implementations25 Jul 2017 Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data.

The Limitations of Deep Learning in Adversarial Settings

11 code implementations24 Nov 2015 Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, Ananthram Swami

In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

Adversarial Attack Adversarial Defense

Cannot find the paper you are looking for? You can Submit a new open access paper.