Search Results for author: Zeming Wei

Found 9 papers, 8 papers with code

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

1 code implementation23 Feb 2024 Yihao Zhang, Hangzhou He, Jingyu Zhu, Huanran Chen, Yifei Wang, Zeming Wei

Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization.

Adversarial Robustness

Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning

1 code implementation9 Feb 2024 Yichuan Mo, Yuji Wang, Zeming Wei, Yisen Wang

To our knowledge, we are the first to implement defense from the perspective of prompt tuning.

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

1 code implementation29 Dec 2023 Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner

Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs.

Instruction Following

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

no code implementations10 Oct 2023 Zeming Wei, Yifei Wang, Yisen Wang

Large Language Models (LLMs) have shown remarkable success in various tasks, but concerns about their safety and the potential for generating malicious content have emerged.

In-Context Learning Language Modelling

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks

1 code implementation24 Jun 2023 Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng Sun

In this paper, we propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks.

Data Augmentation Model extraction

Using Z3 for Formal Modeling and Verification of FNN Global Robustness

1 code implementation20 Apr 2023 Yihao Zhang, Zeming Wei, Xiyue Zhang, Meng Sun

To evaluate the effectiveness of our implementation and improvements, we conduct extensive experiments on a set of benchmark datasets.

Adversarial Robustness

CFA: Class-wise Calibrated Fair Adversarial Training

1 code implementation CVPR 2023 Zeming Wei, Yifei Wang, Yiwen Guo, Yisen Wang

Adversarial training has been widely acknowledged as the most effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs).

Adversarial Robustness Fairness

Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages

1 code implementation27 Jun 2022 Zeming Wei, Xiyue Zhang, Meng Sun

Compositional approaches that are scablable to natural languages fall short in extraction precision.

Data Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.