Search Results for author: Peixuan Han

Found 3 papers, 2 papers with code

Internal Activation as the Polar Star for Steering Unsafe LLM Behavior

no code implementations3 Feb 2025 Peixuan Han, Cheng Qian, Xiusi Chen, Yuji Zhang, Denghui Zhang, Heng Ji

Large language models (LLMs) have demonstrated exceptional capabilities across a wide range of tasks but also pose significant risks due to their potential to generate harmful content.

Safety Alignment

EscapeBench: Pushing Language Models to Think Outside the Box

1 code implementation18 Dec 2024 Cheng Qian, Peixuan Han, Qinyu Luo, Bingxiang He, Xiusi Chen, Yuji Zhang, Hongyi Du, Jiarui Yao, Xiaocheng Yang, Denghui Zhang, Yunzhu Li, Heng Ji

Language model agents excel in long-session planning and reasoning, but existing benchmarks primarily focus on goal-oriented tasks with explicit objectives, neglecting creative adaptation in unfamiliar environments.

Language Modeling Language Modelling

Enhancing Dense Retrievers' Robustness with Group-level Reweighting

2 code implementations25 Oct 2023 Peixuan Han, Zhenghao Liu, Zhiyuan Liu, Chenyan Xiong

In this paper, we introduce WebDRO, an efficient approach for clustering the web graph data and optimizing group weights to enhance the robustness of dense retrieval models.

Clustering Link Prediction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.