Search Results for author: Eric Wong

Found 29 papers, 24 papers with code

Adversarial Robustness Against the Union of Multiple Threat Models

1 code implementation ICML 2020 Pratyush Maini, Eric Wong, Zico Kolter

Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers.

Adversarial Robustness

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

1 code implementation25 Feb 2024 Jiabao Ji, Bairu Hou, Alexander Robey, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

Aligned large language models (LLMs) are vulnerable to jailbreaking attacks, which bypass the safeguards of targeted LLMs and fool them into generating objectionable content.

Instruction Following

Initialization Matters for Adversarial Transfer Learning

no code implementations10 Dec 2023 Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin

Specifically, we reveal that with a standard pretrained model, Parameter-Efficient Finetuning~(PEFT) methods either fail to be adversarially robust or continue to exhibit significantly degraded adversarial robustness on downstream tasks, even with adversarial training during finetuning.

Adversarial Robustness Image Classification +1

Sum-of-Parts Models: Faithful Attributions for Groups of Features

1 code implementation25 Oct 2023 Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong

An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process.

Decision Making

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

1 code implementation19 Oct 2023 Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, Sijia Liu

To address these challenges, we introduce the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation.

Image Classification Image Generation +1

Jailbreaking Black Box Large Language Models in Twenty Queries

1 code implementation12 Oct 2023 Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention.

Comparing Styles across Languages

1 code implementation11 Oct 2023 Shreya Havaldar, Matthew Pressimone, Eric Wong, Lyle Ungar

Understanding how styles differ across languages is advantageous for training both humans and computers to generate culturally appropriate text.

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

1 code implementation5 Oct 2023 Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content.

TopEx: Topic-based Explanations for Model Comparison

no code implementations1 Jun 2023 Shreya Havaldar, Adam Stein, Eric Wong, Lyle Ungar

Meaningfully comparing language models is challenging with current explanation methods.

Rectifying Group Irregularities in Explanations for Distribution Shift

no code implementations25 May 2023 Adam Stein, Yinjun Wu, Eric Wong, Mayur Naik

It is well-known that real-world changes constituting distribution shift adversely affect model performance.

Do Machine Learning Models Learn Statistical Rules Inferred from Data?

1 code implementation2 Mar 2023 Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong

Test-time adaptation reduces these violations by up to 68. 7% with relative performance improvement up to 32%.

Common Sense Reasoning Imputation +3

In-context Example Selection with Influences

1 code implementation21 Feb 2023 Tai Nguyen, Eric Wong

In-context learning (ICL) is a powerful paradigm emerged from large language models (LLMs).

In-Context Learning

Black Box Adversarial Prompting for Foundation Models

1 code implementation8 Feb 2023 Natalie Maus, Patrick Chao, Eric Wong, Jacob Gardner

Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language.

Text Generation

Faithful Chain-of-Thought Reasoning

1 code implementation31 Jan 2023 Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch

While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka.

Math Multi-hop Question Answering +1

A Data-Based Perspective on Transfer Learning

1 code implementation CVPR 2023 Saachi Jain, Hadi Salman, Alaa Khaddaj, Eric Wong, Sung Min Park, Aleksander Madry

It is commonly believed that in transfer learning including more pre-training data translates into better performance.

Transfer Learning

When does Bias Transfer in Transfer Learning?

1 code implementation6 Jul 2022 Hadi Salman, Saachi Jain, Andrew Ilyas, Logan Engstrom, Eric Wong, Aleksander Madry

Using transfer learning to adapt a pre-trained "source model" to a downstream "target task" can dramatically increase performance with seemingly no downside.

Transfer Learning

Missingness Bias in Model Debugging

1 code implementation ICLR 2022 Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry

Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools.

Certified Patch Robustness via Smoothed Vision Transformers

1 code implementation CVPR 2022 Hadi Salman, Saachi Jain, Eric Wong, Aleksander Mądry

Certified patch defenses can guarantee robustness of an image classifier to arbitrary changes within a bounded contiguous region.

DeepSplit: Scalable Verification of Deep Neural Networks via Operator Splitting

1 code implementation16 Jun 2021 Shaoru Chen, Eric Wong, J. Zico Kolter, Mahyar Fazlyab

Analyzing the worst-case performance of deep neural networks against input perturbations amounts to solving a large-scale non-convex optimization problem, for which several past works have proposed convex relaxations as a promising alternative.

Image Classification

Leveraging Sparse Linear Layers for Debuggable Deep Networks

2 code implementations11 May 2021 Eric Wong, Shibani Santurkar, Aleksander Mądry

We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks.

Learning perturbation sets for robust machine learning

1 code implementation ICLR 2021 Eric Wong, J. Zico Kolter

In this paper, we aim to bridge this gap by learning perturbation sets from data, in order to characterize real-world effects for robust training and evaluation.

BIG-bench Machine Learning

Neural Network Virtual Sensors for Fuel Injection Quantities with Provable Performance Specifications

no code implementations30 Jun 2020 Eric Wong, Tim Schneider, Joerg Schmitt, Frank R. Schmidt, J. Zico Kolter

Additionally, we show how specific intervals of fuel injection quantities can be targeted to maximize robustness for certain ranges, allowing us to train a virtual sensor for fuel injection which is provably guaranteed to have at most 10. 69% relative error under noise while maintaining 3% relative error on non-adversarial data within normalized fuel injection ranges of 0. 6 to 1. 0.

Overfitting in adversarially robust deep learning

4 code implementations ICML 2020 Leslie Rice, Eric Wong, J. Zico Kolter

Based upon this observed effect, we show that the performance gains of virtually all recent algorithmic improvements upon adversarial training can be matched by simply using early stopping.

Data Augmentation

Fast is better than free: Revisiting adversarial training

10 code implementations ICLR 2020 Eric Wong, Leslie Rice, J. Zico Kolter

Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $\epsilon=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $\epsilon=2/255$ in 12 hours, in comparison to past work based on "free" adversarial training which took 10 and 50 hours to reach the same respective thresholds.

Adversarial Robustness Against the Union of Multiple Perturbation Models

1 code implementation9 Sep 2019 Pratyush Maini, Eric Wong, J. Zico Kolter

Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers.

Adversarial Robustness

Wasserstein Adversarial Examples via Projected Sinkhorn Iterations

2 code implementations21 Feb 2019 Eric Wong, Frank R. Schmidt, J. Zico Kolter

In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance.

Adversarial Attack Adversarial Defense +4

Scaling provable adversarial defenses

4 code implementations NeurIPS 2018 Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, J. Zico Kolter

Recent work has developed methods for learning deep network classifiers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks.

Provable defenses against adversarial examples via the convex outer adversarial polytope

8 code implementations ICML 2018 Eric Wong, J. Zico Kolter

We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data.

Adversarial Attack

Cannot find the paper you are looking for? You can Submit a new open access paper.