Search Results for author: Yesai Wu

Found 7 papers, 6 papers with code

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs

1 code implementation18 Oct 2024 Runchu Tian, Yanghao Li, Yuepeng Fu, Siyang Deng, Qinyu Luo, Cheng Qian, Shuo Wang, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Huadong Wang, Xiaojiang Liu

These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces.

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

1 code implementation16 Oct 2024 Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, Maosong Sun

The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents.

Learning Evolving Tools for Large Language Models

1 code implementation9 Oct 2024 Guoxin Chen, Zhong Zhang, Xin Cong, Fangda Guo, Yesai Wu, Yankai Lin, Wenzheng Feng, Yasheng Wang

Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs.

Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution

no code implementations25 Jan 2024 Cheng Qian, Shihao Liang, Yujia Qin, Yining Ye, Xin Cong, Yankai Lin, Yesai Wu, Zhiyuan Liu, Maosong Sun

This paper introduces Investigate-Consolidate-Exploit (ICE), a novel strategy for enhancing the adaptability and flexibility of AI agents through inter-task self-evolution.

DebugBench: Evaluating Debugging Capability of Large Language Models

1 code implementation9 Jan 2024 Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, Maosong Sun

Previous evaluations of LLMs' debugging ability are significantly limited by the risk of data leakage, the scale of the dataset, and the variety of tested bugs.

Code Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.