Search Results for author: Tongxin Yuan

Found 3 papers, 2 papers with code

Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

1 code implementation5 Aug 2024 Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, Hai Zhao

This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by environmental context.

Language Modeling Language Modelling +2

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

no code implementations6 Feb 2024 Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

1 code implementation18 Jan 2024 Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, Gongshen Liu

We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records.

Benchmarking

Cannot find the paper you are looking for? You can Submit a new open access paper.