Search Results for author: Yuanpu Cao

Found 8 papers, 5 papers with code

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

1 code implementation28 Oct 2024 Yaopei Zeng, Yuanpu Cao, Bochuan Cao, Yurui Chang, Jinghui Chen, Lu Lin

Recent advances in diffusion models have significantly enhanced the quality of image synthesis, yet they have also introduced serious safety concerns, particularly the generation of Not Safe for Work (NSFW) content.

Adversarial Text Image Generation

Adversarially Robust Industrial Anomaly Detection Through Diffusion Model

no code implementations9 Aug 2024 Yuanpu Cao, Lu Lin, Jinghui Chen

We propose a simple yet effective adversarially robust anomaly detection method, \textit{AdvRAD}, that allows the diffusion model to act both as an anomaly detector and adversarial purifier.

Adversarial Purification Adversarial Robustness +1

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

1 code implementation28 May 2024 Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen

Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications.

Hallucination

WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response

no code implementations22 May 2024 Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, Jinghui Chen

The recent breakthrough in large language models (LLMs) such as ChatGPT has revolutionized production processes at an unprecedented pace.

LLM Jailbreak Safety Alignment

Federated Learning with Projected Trajectory Regularization

no code implementations22 Dec 2023 Tiejin Chen, Yuanpu Cao, Yujia Wang, Cho-Jui Hsieh, Jinghui Chen

Specifically, FedPTR allows local clients or the server to optimize an auxiliary (synthetic) dataset that mimics the learning dynamics of the recent model update and utilizes it to project the next-step model trajectory for local training regularization.

Federated Learning

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

1 code implementation15 Nov 2023 Yuanpu Cao, Bochuan Cao, Jinghui Chen

In this work, we show that it is possible to conduct stealthy and persistent unalignment on large language models via backdoor injections.

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM

1 code implementation18 Sep 2023 Bochuan Cao, Yuanpu Cao, Lu Lin, Jinghui Chen

In this work, we introduce a Robustly Aligned LLM (RA-LLM) to defend against potential alignment-breaking attacks.

RLCard: A Toolkit for Reinforcement Learning in Card Games

9 code implementations10 Oct 2019 Daochen Zha, Kwei-Herng Lai, Yuanpu Cao, Songyi Huang, Ruzhe Wei, Junyu Guo, Xia Hu

The goal of RLCard is to bridge reinforcement learning and imperfect information games, and push forward the research of reinforcement learning in domains with multiple agents, large state and action space, and sparse reward.

Board Games Game of Poker +4

Cannot find the paper you are looking for? You can Submit a new open access paper.