no code implementations • 5 Mar 2025 • Yurui Chang, Bochuan Cao, Lu Lin
While large language models have demonstrated exceptional performance across a wide range of tasks, they remain susceptible to hallucinations -- generating plausible yet factually incorrect contents.
no code implementations • 6 Feb 2025 • Hanyu Wang, Bochuan Cao, Yuanpu Cao, Jinghui Chen
Large language models (LLMs) are known to struggle with consistently generating truthful responses.
1 code implementation • 9 Dec 2024 • Bochuan Cao, Jinyuan Jia, Chuxuan Hu, Wenbo Guo, Zhen Xiang, Jinghui Chen, Bo Li, Dawn Song
Existing backdoor attacks require either retraining the classifier with some clean data or modifying the model's architecture.
1 code implementation • 28 Oct 2024 • Yaopei Zeng, Yuanpu Cao, Bochuan Cao, Yurui Chang, Jinghui Chen, Lu Lin
Recent advances in diffusion models have significantly enhanced the quality of image synthesis, yet they have also introduced serious safety concerns, particularly the generation of Not Safe for Work (NSFW) content.
no code implementations • 4 Jun 2024 • Guangliang Liu, Haitao Mao, Bochuan Cao, Zhiyu Xue, Xitong Zhang, Rongrong Wang, Jiliang Tang, Kristen Johnson
Our findings are verified in: (1) the scenario of multi-round question answering, by comprehensively demonstrating that intrinsic self-correction can progressively introduce performance gains through iterative interactions, ultimately converging to stable performance; and (2) the context of intrinsic self-correction for enhanced morality, in which we provide empirical evidence that iteratively applying instructions reduces model uncertainty towards convergence, which then leads to convergence of both the calibration error and self-correction performance, ultimately resulting in a stable state of intrinsic self-correction.
no code implementations • 30 May 2024 • Yurui Chang, Bochuan Cao, Yujia Wang, Jinghui Chen, Lu Lin
In this study, we introduce a counterfactual explanation framework based on joint prompt attribution, XPrompt, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation.
1 code implementation • 28 May 2024 • Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen
Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications.
no code implementations • 22 May 2024 • Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, Jinghui Chen
The recent breakthrough in large language models (LLMs) such as ChatGPT has revolutionized production processes at an unprecedented pace.
no code implementations • 14 Dec 2023 • Changjiang Li, Ren Pang, Bochuan Cao, Zhaohan Xi, Jinghui Chen, Shouling Ji, Ting Wang
Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers.
1 code implementation • 15 Nov 2023 • Yuanpu Cao, Bochuan Cao, Jinghui Chen
In this work, we show that it is possible to conduct stealthy and persistent unalignment on large language models via backdoor injections.
1 code implementation • NeurIPS 2023 • Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, Jinghui Chen
IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e. g., style mimicking, malicious editing).
no code implementations • 2 Oct 2023 • Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu
A natural question is "could alignment really prevent those open-sourced large language models from being misused to generate undesired content?''.
1 code implementation • 18 Sep 2023 • Bochuan Cao, Yuanpu Cao, Lu Lin, Jinghui Chen
In this work, we introduce a Robustly Aligned LLM (RA-LLM) to defend against potential alignment-breaking attacks.
1 code implementation • 25 Nov 2022 • Huaxiu Yao, Caroline Choi, Bochuan Cao, Yoonho Lee, Pang Wei Koh, Chelsea Finn
Temporal shifts -- distribution shifts arising from the passage of time -- often occur gradually and have the additional structure of timestamp metadata.
no code implementations • 12 Jul 2021 • Jiacheng Liang, Songze Li, Bochuan Cao, Wensi Jiang, Chaoyang He
Utilizing OmniLytics, many distributed data owners can contribute their private data to collectively train an ML model requested by some model owners, and receive compensation for data contribution.