no code implementations • 20 Feb 2024 • Zeyang Sha, Yang Zhang
Our proposed prompt stealing attack aims to steal these well-designed prompts based on the generated answers.
no code implementations • 5 Feb 2024 • Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang
We then introduce two advanced attacks aimed at better reconstructing previous conversations, specifically the UNR attack and the PBU attack.
no code implementations • 3 Nov 2023 • Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang
Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP.
no code implementations • 9 Mar 2023 • Ziqing Yang, Zeyang Sha, Michael Backes, Yang Zhang
In this sense, we propose SeMap, a more effective mapping using the semantic alignment between the pre-trained model's knowledge and the downstream task.
no code implementations • 18 Dec 2022 • Zeyang Sha, Xinlei He, Pascal Berrang, Mathias Humbert, Yang Zhang
Backdoor attacks represent one of the major threats to machine learning models.
no code implementations • 13 Oct 2022 • Zeyang Sha, Zheng Li, Ning Yu, Yang Zhang
To tackle this problem, we pioneer a systematic study on the detection and attribution of fake images generated by text-to-image generation models.
1 code implementation • CVPR 2023 • Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang
Self-supervised representation learning techniques have been developing rapidly to make full use of unlabeled images.