Search Results for author: Lizhen Xu

Found 1 papers, 1 papers with code

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

1 code implementation18 Jan 2024 Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, Gongshen Liu

We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records.

Benchmarking

Cannot find the paper you are looking for? You can Submit a new open access paper.