Search Results for author: Zeming Wei

Found 11 papers, 10 papers with code

Exploring the Robustness of In-Context Learning with Noisy Labels

1 code implementation • 28 Apr 2024 • Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei

In this paper, inspired by prior research that studies ICL ability using simple function classes, we take a closer look at this problem by investigating the robustness of Transformers against noisy labels.

Data Augmentation In-Context Learning +1

Paper
Code

Towards General Conceptual Model Editing via Adversarial Representation Engineering

1 code implementation • 21 Apr 2024 • Yihao Zhang, Zeming Wei, Jun Sun, Meng Sun

Recent research has introduced Representation Engineering (RepE) as a promising approach for understanding complex inner workings of large-scale models like Large Language Models (LLMs).

Generative Adversarial Network Model Editing

Paper
Code

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

1 code implementation • 23 Feb 2024 • Yihao Zhang, Hangzhou He, Jingyu Zhu, Huanran Chen, Yifei Wang, Zeming Wei

Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization.

Adversarial Robustness

Paper
Code

Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning

1 code implementation • 9 Feb 2024 • Yichuan Mo, Yuji Wang, Zeming Wei, Yisen Wang

To our knowledge, we are the first to implement defense from the perspective of prompt tuning.

Paper
Code

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

1 code implementation • 29 Dec 2023 • Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner

Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs.

Instruction Following

Paper
Code

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

no code implementations • 10 Oct 2023 • Zeming Wei, Yifei Wang, Yisen Wang

Large Language Models (LLMs) have shown remarkable success in various tasks, but concerns about their safety and the potential for generating malicious content have emerged.

In-Context Learning Language Modelling

Paper
Add Code

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks

1 code implementation • 24 Jun 2023 • Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng Sun

In this paper, we propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks.

Data Augmentation Model extraction

Paper
Code

Sharpness-Aware Minimization Alone can Improve Adversarial Robustness

1 code implementation • 9 May 2023 • Zeming Wei, Jingyu Zhu, Yihao Zhang

In this paper, we explore SAM in the context of adversarial robustness.

Adversarial Robustness Mathematical Proofs

Paper
Code

Using Z3 for Formal Modeling and Verification of FNN Global Robustness

1 code implementation • 20 Apr 2023 • Yihao Zhang, Zeming Wei, Xiyue Zhang, Meng Sun

To evaluate the effectiveness of our implementation and improvements, we conduct extensive experiments on a set of benchmark datasets.

Adversarial Robustness

Paper
Code

CFA: Class-wise Calibrated Fair Adversarial Training

1 code implementation • CVPR 2023 • Zeming Wei, Yifei Wang, Yiwen Guo, Yisen Wang

Adversarial training has been widely acknowledged as the most effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs).

Adversarial Robustness Fairness

Paper
Code

Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages

1 code implementation • 27 Jun 2022 • Zeming Wei, Xiyue Zhang, Meng Sun

Compositional approaches that are scablable to natural languages fall short in extraction precision.

Data Augmentation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.