Search Results for author: Lizhen Xu

Found 1 papers, 1 papers with code

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

1 code implementation • 18 Jan 2024 • Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, Gongshen Liu

We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records.

Benchmarking

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.