Search Results for author: Bochuan Cao

Found 15 papers, 7 papers with code

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

no code implementations5 Mar 2025 Yurui Chang, Bochuan Cao, Lu Lin

While large language models have demonstrated exceptional performance across a wide range of tasks, they remain susceptible to hallucinations -- generating plausible yet factually incorrect contents.

Hallucination

TruthFlow: Truthful LLM Generation via Representation Flow Correction

no code implementations6 Feb 2025 Hanyu Wang, Bochuan Cao, Yuanpu Cao, Jinghui Chen

Large language models (LLMs) are known to struggle with consistently generating truthful responses.

Hallucination TruthfulQA

Data Free Backdoor Attacks

1 code implementation9 Dec 2024 Bochuan Cao, Jinyuan Jia, Chuxuan Hu, Wenbo Guo, Zhen Xiang, Jinghui Chen, Bo Li, Dawn Song

Existing backdoor attacks require either retraining the classifier with some clean data or modifying the model's architecture.

Backdoor Attack

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

1 code implementation28 Oct 2024 Yaopei Zeng, Yuanpu Cao, Bochuan Cao, Yurui Chang, Jinghui Chen, Lu Lin

Recent advances in diffusion models have significantly enhanced the quality of image synthesis, yet they have also introduced serious safety concerns, particularly the generation of Not Safe for Work (NSFW) content.

Adversarial Text Image Generation

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

no code implementations4 Jun 2024 Guangliang Liu, Haitao Mao, Bochuan Cao, Zhiyu Xue, Xitong Zhang, Rongrong Wang, Jiliang Tang, Kristen Johnson

Our findings are verified in: (1) the scenario of multi-round question answering, by comprehensively demonstrating that intrinsic self-correction can progressively introduce performance gains through iterative interactions, ultimately converging to stable performance; and (2) the context of intrinsic self-correction for enhanced morality, in which we provide empirical evidence that iteratively applying instructions reduces model uncertainty towards convergence, which then leads to convergence of both the calibration error and self-correction performance, ultimately resulting in a stable state of intrinsic self-correction.

Question Answering Safety Alignment

XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

no code implementations30 May 2024 Yurui Chang, Bochuan Cao, Yujia Wang, Jinghui Chen, Lu Lin

In this study, we introduce a counterfactual explanation framework based on joint prompt attribution, XPrompt, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation.

Combinatorial Optimization counterfactual +2

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

1 code implementation28 May 2024 Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen

Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications.

Hallucination

WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response

no code implementations22 May 2024 Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, Jinghui Chen

The recent breakthrough in large language models (LLMs) such as ChatGPT has revolutionized production processes at an unprecedented pace.

LLM Jailbreak Safety Alignment

On the Difficulty of Defending Contrastive Learning against Backdoor Attacks

no code implementations14 Dec 2023 Changjiang Li, Ren Pang, Bochuan Cao, Zhaohan Xi, Jinghui Chen, Shouling Ji, Ting Wang

Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers.

Contrastive Learning

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

1 code implementation15 Nov 2023 Yuanpu Cao, Bochuan Cao, Jinghui Chen

In this work, we show that it is possible to conduct stealthy and persistent unalignment on large language models via backdoor injections.

Red Teaming

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

1 code implementation NeurIPS 2023 Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, Jinghui Chen

IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e. g., style mimicking, malicious editing).

Image Generation

On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?

no code implementations2 Oct 2023 Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu

A natural question is "could alignment really prevent those open-sourced large language models from being misused to generate undesired content?''.

Text Generation

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM

1 code implementation18 Sep 2023 Bochuan Cao, Yuanpu Cao, Lu Lin, Jinghui Chen

In this work, we introduce a Robustly Aligned LLM (RA-LLM) to defend against potential alignment-breaking attacks.

Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time

1 code implementation25 Nov 2022 Huaxiu Yao, Caroline Choi, Bochuan Cao, Yoonho Lee, Pang Wei Koh, Chelsea Finn

Temporal shifts -- distribution shifts arising from the passage of time -- often occur gradually and have the additional structure of timestamp metadata.

Continual Learning Domain Generalization +4

OmniLytics: A Blockchain-based Secure Data Market for Decentralized Machine Learning

no code implementations12 Jul 2021 Jiacheng Liang, Songze Li, Bochuan Cao, Wensi Jiang, Chaoyang He

Utilizing OmniLytics, many distributed data owners can contribute their private data to collectively train an ML model requested by some model owners, and receive compensation for data contribution.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.