Search Results for author: Caishuang Huang

Found 13 papers, 12 papers with code

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

1 code implementation26 Jun 2024 Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, WenYu Zhan, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang

As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research.

Safety Alignment

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

1 code implementation26 Feb 2024 Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs).

Code Completion Response Generation

ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages

1 code implementation16 Feb 2024 Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong Wu, Qi Zhang, Tao Gui, Xuanjing Huang

Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios.

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

1 code implementation16 Jan 2024 Junjie Ye, Yilong Wu, Songyang Gao, Caishuang Huang, Sixian Li, Guanyu Li, Xiaoran Fan, Qi Zhang, Tao Gui, Xuanjing Huang

To bridge this gap, we introduce RoTBench, a multi-level benchmark for evaluating the robustness of LLMs in tool learning.

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

1 code implementation1 Jan 2024 Junjie Ye, Guanyu Li, Songyang Gao, Caishuang Huang, Yilong Wu, Sixian Li, Xiaoran Fan, Shihan Dou, Tao Ji, Qi Zhang, Tao Gui, Xuanjing Huang

Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes.

Cannot find the paper you are looking for? You can Submit a new open access paper.