no code implementations • 7 Feb 2025 • Zeren Luo, Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Jingyi Zheng, Xinlei He
However, we observe that these AIPSEs raise risks such as quoting malicious content or citing malicious websites, leading to harmful or unverified information dissemination.
1 code implementation • 6 Feb 2025 • Heyi Zhang, Yule Liu, Xinlei He, Jun Wu, Tianshuo Cong, Xinyi Huang
This paper aims to provide a unified benchmark and analysis of defenses against DPAs and MPAs, clarifying the distinction between these two similar but slightly distinct domains.
1 code implementation • 26 Dec 2024 • Jingyi Zheng, Tianyi Hu, Tianshuo Cong, Xinlei He
Backdoor attacks significantly compromise the security of large language models by triggering them to output specific and controlled content.
no code implementations • 24 Dec 2024 • Zhen Sun, Zongmin Zhang, Xinyue Shen, Ziyi Zhang, Yule Liu, Michael Backes, Yang Zhang, Xinlei He
To address this gap, this paper aims to quantify, monitor, and analyze the AIGTs on online social media platforms.
1 code implementation • 23 Dec 2024 • Yule Liu, Zhiyuan Zhong, Yifan Liao, Zhen Sun, Jingyi Zheng, Jiaheng Wei, Qingyuan Gong, Fenghua Tong, Yang Chen, Yang Zhang, Xinlei He
The rise of large language models (LLMs) has raised concerns about machine-generated text (MGT), including ethical and practical issues like plagiarism and misinformation.
no code implementations • 2 Dec 2024 • Wenhan Dong, Chao Lin, Xinlei He, Xinyi Huang, Shengmin Xu
Privacy-preserving federated learning (PPFL) aims to train a global model for multiple clients while maintaining their data privacy.
no code implementations • 29 Nov 2024 • Yule Liu, Zhen Sun, Xinlei He, Xinyi Huang
Regarding the security risks, user-defined fine-tuning can introduce security vulnerabilities, such as alignment issues, backdoor attacks, and hallucinations.
no code implementations • 21 Aug 2024 • Minghao Liu, Zonglin Di, Jiaheng Wei, Zhongruo Wang, Hengxiang Zhang, Ruixuan Xiao, Haoyu Wang, Jinlong Pang, Hao Chen, Ankit Shah, Hongxin Wei, Xinlei He, Zhaowei Zhao, Haobo Wang, Lei Feng, Jindong Wang, James Davis, Yang Liu
Furthermore, we design three benchmark datasets focused on label noise detection, label noise learning, and class-imbalanced learning.
no code implementations • 13 Aug 2024 • Zheng Li, Xinlei He, Ning Yu, Yang Zhang
Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition.
no code implementations • 5 Jul 2024 • Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Jiaxing Song, Ke Xu, Qi Li
In this paper, we propose a comprehensive and detailed taxonomy of jailbreak attack and defense methods.
no code implementations • 5 Jul 2024 • Zesen Liu, Tianshuo Cong, Xinlei He, Qi Li
Large Language Models (LLMs) excel in various applications, including text generation and complex tasks.
1 code implementation • 13 Jun 2024 • Delong Ran, JinYuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang
In summary, we regard JailbreakEval to be a catalyst that simplifies the evaluation process in jailbreak research and fosters an inclusive standard for jailbreak evaluation within the community.
no code implementations • 8 Jun 2024 • Yanling Wang, Haoyang Li, Hao Zou, Jing Zhang, Xinlei He, Qi Li, Ke Xu
Despite advancements in large language models (LLMs), non-factual responses remain prevalent.
1 code implementation • 9 May 2024 • Yixin Wu, Xinlei He, Pascal Berrang, Mathias Humbert, Michael Backes, Neil Zhenqiang Gong, Yang Zhang
This paper fills the gap by conducting a systematic privacy analysis of inductive GNNs through the lens of link stealing attacks, one of the most popular attacks that are specifically designed for GNNs.
1 code implementation • 8 Apr 2024 • Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, JinYuan Liu, Yichen Gong, Qi Li, Anyu Wang, XiaoYun Wang
Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e. g., GPUs) or require the collection of specific training data.
no code implementations • 16 Oct 2023 • Joann Qiongna Chen, Xinlei He, Zheng Li, Yang Zhang, Zhou Li
Training a machine learning model with data following a meaningful order, i. e., from easy to hard, has been proven to be effective in accelerating the training process and achieving better model performance.
1 code implementation • 10 Aug 2023 • Xinlei He, Savvas Zannettou, Yun Shen, Yang Zhang
We find that prompt learning achieves around 10\% improvement in the toxicity classification task compared to the baselines, while for the toxic span detection task we find better performance to the best baseline (0. 643 vs. 0. 640 in terms of $F_1$-score).
1 code implementation • 13 Jun 2023 • Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, Yang Zhang
Graph generative models become increasingly effective for data distribution approximation and data augmentation.
no code implementations • 13 Jun 2023 • Yihan Ma, Zhengyu Zhao, Xinlei He, Zheng Li, Michael Backes, Yang Zhang
In particular, to help the watermark survive the subject-driven synthesis, we incorporate the synthesis process in learning GenWatermark by fine-tuning the detector with synthesized images for a specific subject.
1 code implementation • 23 May 2023 • Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang Zhang
Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world.
3 code implementations • 26 Mar 2023 • Xinlei He, Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang
Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods.
1 code implementation • 23 Feb 2023 • Boyang Zhang, Xinlei He, Yun Shen, Tianhao Wang, Yang Zhang
Given the simplicity and effectiveness of the attack method, our study indicates scientific plots indeed constitute a valid side channel for model information stealing attacks.
no code implementations • 18 Dec 2022 • Zeyang Sha, Xinlei He, Pascal Berrang, Mathias Humbert, Yang Zhang
Backdoor attacks represent one of the major threats to machine learning models.
2 code implementations • 13 Dec 2022 • Yiting Qu, Xinlei He, Shannon Pierson, Michael Backes, Yang Zhang, Savvas Zannettou
The dissemination of hateful memes online has adverse effects on social media platforms and the real world.
no code implementations • 4 Oct 2022 • Xinyue Shen, Xinlei He, Zheng Li, Yun Shen, Michael Backes, Yang Zhang
Different from previous work, we are the first to systematically threat modeling on SSL in every phase of the model supply chain, i. e., pre-training, release, and downstream phases.
1 code implementation • 30 Sep 2022 • Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, Yang Zhang
Extensive evaluations on different datasets and model architectures show that all three attacks can achieve significant attack performance while maintaining model utility in both visual and linguistic modalities.
no code implementations • 23 Aug 2022 • Zheng Li, Yiyong Liu, Xinlei He, Ning Yu, Michael Backes, Yang Zhang
Furthermore, we propose a hybrid attack that exploits the exit information to improve the performance of existing attacks.
no code implementations • 22 Aug 2022 • Xinlei He, Zheng Li, Weilin Xu, Cory Cornelius, Yang Zhang
Finally, we find that data augmentation degrades the performance of existing attacks to a larger extent, and we propose an adaptive attack using augmentation to train shadow and attack models that improve attack performance.
1 code implementation • 25 Jul 2022 • Xinlei He, Hongbin Liu, Neil Zhenqiang Gong, Yang Zhang
The results show that early stopping can mitigate the membership inference attack, but with the cost of model's utility degradation.
1 code implementation • 27 Jan 2022 • Tianshuo Cong, Xinlei He, Yang Zhang
Recent research has shown that the machine learning model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model.
1 code implementation • CVPR 2023 • Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang
Self-supervised representation learning techniques have been developing rapidly to make full use of unlabeled images.
1 code implementation • 15 Dec 2021 • Yun Shen, Xinlei He, Yufei Han, Yang Zhang
Graph neural networks (GNNs), a new family of machine learning (ML) models, have been proposed to fully leverage graph data to build powerful applications.
no code implementations • 10 Feb 2021 • Xinlei He, Rui Wen, Yixin Wu, Michael Backes, Yun Shen, Yang Zhang
To fully utilize the information contained in graph data, a new family of machine learning (ML) models, namely graph neural networks (GNNs), has been introduced.
1 code implementation • 8 Feb 2021 • Xinlei He, Yang Zhang
Our experimental results show that contrastive models trained on image datasets are less vulnerable to membership inference attacks but more vulnerable to attribute inference attacks compared to supervised models.
1 code implementation • 4 Feb 2021 • Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang
As a result, we lack a comprehensive picture of the risks caused by the attacks, e. g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of possible defenses.
1 code implementation • 5 May 2020 • Xinlei He, Jinyuan Jia, Michael Backes, Neil Zhenqiang Gong, Yang Zhang
In this work, we propose the first attacks to steal a graph from the outputs of a GNN model that is trained on the graph.