Search Results for author: Zeming Wei

Found 11 papers, 10 papers with code

Exploring the Robustness of In-Context Learning with Noisy Labels

1 code implementation28 Apr 2024 Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei

In this paper, inspired by prior research that studies ICL ability using simple function classes, we take a closer look at this problem by investigating the robustness of Transformers against noisy labels.

Data Augmentation In-Context Learning +1

Towards General Conceptual Model Editing via Adversarial Representation Engineering

1 code implementation21 Apr 2024 Yihao Zhang, Zeming Wei, Jun Sun, Meng Sun

Recent research has introduced Representation Engineering (RepE) as a promising approach for understanding complex inner workings of large-scale models like Large Language Models (LLMs).

Generative Adversarial Network Model Editing

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

1 code implementation23 Feb 2024 Yihao Zhang, Hangzhou He, Jingyu Zhu, Huanran Chen, Yifei Wang, Zeming Wei

Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization.

Adversarial Robustness

Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning

1 code implementation9 Feb 2024 Yichuan Mo, Yuji Wang, Zeming Wei, Yisen Wang

To our knowledge, we are the first to implement defense from the perspective of prompt tuning.

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

1 code implementation29 Dec 2023 Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner

Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs.

Instruction Following

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

no code implementations10 Oct 2023 Zeming Wei, Yifei Wang, Yisen Wang

Large Language Models (LLMs) have shown remarkable success in various tasks, but concerns about their safety and the potential for generating malicious content have emerged.

In-Context Learning Language Modelling

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks

1 code implementation24 Jun 2023 Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng Sun

In this paper, we propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks.

Data Augmentation Model extraction

Using Z3 for Formal Modeling and Verification of FNN Global Robustness

1 code implementation20 Apr 2023 Yihao Zhang, Zeming Wei, Xiyue Zhang, Meng Sun

To evaluate the effectiveness of our implementation and improvements, we conduct extensive experiments on a set of benchmark datasets.

Adversarial Robustness

CFA: Class-wise Calibrated Fair Adversarial Training

1 code implementation CVPR 2023 Zeming Wei, Yifei Wang, Yiwen Guo, Yisen Wang

Adversarial training has been widely acknowledged as the most effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs).

Adversarial Robustness Fairness

Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages

1 code implementation27 Jun 2022 Zeming Wei, Xiyue Zhang, Meng Sun

Compositional approaches that are scablable to natural languages fall short in extraction precision.

Data Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.