Search Results for author: Xiaogeng Liu

Found 18 papers, 11 papers with code

OET: Optimization-based prompt injection Evaluation Toolkit

1 code implementation1 May 2025 Jinsheng Pan, Xiaogeng Liu, Chaowei Xiao

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, enabling their widespread adoption across various domains.

Adversarial Robustness Natural Language Understanding +1

Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Model

no code implementations27 Apr 2025 Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Yue Zhao, Zhen Xiang, Chaowei Xiao

The increasing capabilities of agentic multi-modal large reasoning models, such as ChatGPT o3, have raised critical concerns regarding privacy leakage through inadvertent image geolocation.

Visual Reasoning

AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

no code implementations17 Feb 2025 Weidi Luo, Shenghong Dai, Xiaogeng Liu, Suman Banerjee, Huan Sun, Muhao Chen, Chaowei Xiao

The rapid advancements in Large Language Models (LLMs) have enabled their deployment as autonomous agents for handling complex tasks in dynamic environments.

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

2 code implementations30 Oct 2024 Hao Li, Xiaogeng Liu

NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation.

Benchmarking

RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process

no code implementations11 Oct 2024 Peiran Wang, Xiaogeng Liu, Chaowei Xiao

In this study, we introduce RePD, an innovative attack Retrieval-based Prompt Decomposition framework designed to mitigate the risk of jailbreak attacks on large language models (LLMs).

One-Shot Learning Retrieval

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

2 code implementations3 Oct 2024 Xiaogeng Liu, Peiran Li, Edward Suh, Yevgeniy Vorobeychik, Zhuoqing Mao, Somesh Jha, Patrick McDaniel, Huan Sun, Bo Li, Chaowei Xiao

In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e. g., specified candidate strategies), and use them for red-teaming.

Red Teaming

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

no code implementations25 May 2024 Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu

With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical.

JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

1 code implementation3 Apr 2024 Weidi Luo, Siyuan Ma, Xiaogeng Liu, XIAOYU GUO, Chaowei Xiao

With the rapid advancements in Multimodal Large Language Models (MLLMs), securing these models against malicious inputs while aligning them with human values has emerged as a critical challenge.

LLM Jailbreak

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

no code implementations26 Mar 2024 Zhiyuan Yu, Xiaogeng Liu, Shunning Liang, Zach Cameron, Chaowei Xiao, Ning Zhang

Building on the insights from the user study, we also developed a system using AI as the assistant to automate the process of jailbreak prompt generation.

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

1 code implementation14 Mar 2024 Yu Wang, Xiaogeng Liu, Yu Li, Muhao Chen, Chaowei Xiao

However, with the integration of additional modalities, MLLMs are exposed to new vulnerabilities, rendering them prone to structured-based jailbreak attacks, where semantic content (e. g., "harmful text") has been injected into the images to mislead MLLMs.

Automatic and Universal Prompt Injection Attacks against Large Language Models

1 code implementation7 Mar 2024 Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao

Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions.

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

no code implementations7 Dec 2023 Fangzhou Wu, Xiaogeng Liu, Chaowei Xiao

In this paper, we introduce DeceptPrompt, a novel algorithm that can generate adversarial natural language instructions that drive the Code LLMs to generate functionality correct code with vulnerabilities.

Code Generation Red Teaming

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

2 code implementations3 Oct 2023 Xiaogeng Liu, Nan Xu, Muhao Chen, Chaowei Xiao

In light of these challenges, we intend to answer this question: Can we develop an approach that can automatically generate stealthy jailbreak prompts?

Decision Making

Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training

1 code implementation15 Jul 2023 Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin

Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability.

Attribute Data Augmentation

Towards Efficient Data-Centric Robust Machine Learning with Noise-based Augmentation

no code implementations8 Mar 2022 Xiaogeng Liu, Haoyu Wang, Yechao Zhang, Fangzhou Wu, Shengshan Hu

The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models.

BIG-bench Machine Learning Data Augmentation

Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer

1 code implementation CVPR 2022 Shengshan Hu, Xiaogeng Liu, Yechao Zhang, Minghui Li, Leo Yu Zhang, Hai Jin, Libing Wu

While deep face recognition (FR) systems have shown amazing performance in identification and verification, they also arouse privacy concerns for their excessive surveillance on users, especially for public face images widely spread on social networks.

Face Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.