1 code implementation • 1 May 2025 • Jinsheng Pan, Xiaogeng Liu, Chaowei Xiao
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, enabling their widespread adoption across various domains.
no code implementations • 27 Apr 2025 • Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Yue Zhao, Zhen Xiang, Chaowei Xiao
The increasing capabilities of agentic multi-modal large reasoning models, such as ChatGPT o3, have raised critical concerns regarding privacy leakage through inadvertent image geolocation.
no code implementations • 17 Feb 2025 • Weidi Luo, Shenghong Dai, Xiaogeng Liu, Suman Banerjee, Huan Sun, Muhao Chen, Chaowei Xiao
The rapid advancements in Large Language Models (LLMs) have enabled their deployment as autonomous agents for handling complex tasks in dynamic environments.
2 code implementations • 30 Oct 2024 • Hao Li, Xiaogeng Liu
NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation.
no code implementations • 11 Oct 2024 • Peiran Wang, Xiaogeng Liu, Chaowei Xiao
In this study, we introduce RePD, an innovative attack Retrieval-based Prompt Decomposition framework designed to mitigate the risk of jailbreak attacks on large language models (LLMs).
2 code implementations • 3 Oct 2024 • Xiaogeng Liu, Peiran Li, Edward Suh, Yevgeniy Vorobeychik, Zhuoqing Mao, Somesh Jha, Patrick McDaniel, Huan Sun, Bo Li, Chaowei Xiao
In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e. g., specified candidate strategies), and use them for red-teaming.
1 code implementation • 13 Jun 2024 • Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen
We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs.
no code implementations • 25 May 2024 • Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu
With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical.
1 code implementation • 3 Apr 2024 • Weidi Luo, Siyuan Ma, Xiaogeng Liu, XIAOYU GUO, Chaowei Xiao
With the rapid advancements in Multimodal Large Language Models (MLLMs), securing these models against malicious inputs while aligning them with human values has emerged as a critical challenge.
no code implementations • 26 Mar 2024 • Zhiyuan Yu, Xiaogeng Liu, Shunning Liang, Zach Cameron, Chaowei Xiao, Ning Zhang
Building on the insights from the user study, we also developed a system using AI as the assistant to automate the process of jailbreak prompt generation.
1 code implementation • 14 Mar 2024 • Yu Wang, Xiaogeng Liu, Yu Li, Muhao Chen, Chaowei Xiao
However, with the integration of additional modalities, MLLMs are exposed to new vulnerabilities, rendering them prone to structured-based jailbreak attacks, where semantic content (e. g., "harmful text") has been injected into the images to mislead MLLMs.
1 code implementation • 7 Mar 2024 • Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao
Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions.
no code implementations • 7 Dec 2023 • Fangzhou Wu, Xiaogeng Liu, Chaowei Xiao
In this paper, we introduce DeceptPrompt, a novel algorithm that can generate adversarial natural language instructions that drive the Code LLMs to generate functionality correct code with vulnerabilities.
2 code implementations • 3 Oct 2023 • Xiaogeng Liu, Nan Xu, Muhao Chen, Chaowei Xiao
In light of these challenges, we intend to answer this question: Can we develop an approach that can automatically generate stealthy jailbreak prompts?
1 code implementation • 15 Jul 2023 • Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin
Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability.
1 code implementation • CVPR 2023 • Xiaogeng Liu, Minghui Li, Haoyu Wang, Shengshan Hu, Dengpan Ye, Hai Jin, Libing Wu, Chaowei Xiao
Deep neural networks are proven to be vulnerable to backdoor attacks.
no code implementations • 8 Mar 2022 • Xiaogeng Liu, Haoyu Wang, Yechao Zhang, Fangzhou Wu, Shengshan Hu
The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models.
1 code implementation • CVPR 2022 • Shengshan Hu, Xiaogeng Liu, Yechao Zhang, Minghui Li, Leo Yu Zhang, Hai Jin, Libing Wu
While deep face recognition (FR) systems have shown amazing performance in identification and verification, they also arouse privacy concerns for their excessive surveillance on users, especially for public face images widely spread on social networks.