AgentEval is part of the AgentGym framework, which is designed to evaluate and develop generally-capable Large Language Model-based (LLM-based) agents. AgentEval serves as a benchmark suite within AgentGym, providing a set of tasks and environments to assess the performance of these agents¹².

The AgentGym framework includes diverse interactive environments and tasks with a unified format, supporting real-time feedback and concurrency, which is essential for the development and scaling of LLM-based agents. The benchmark suite, AgentEval, along with the trajectory sets AgentTraj and AgentTraj-L, enables researchers and developers to measure the capabilities of agents across a broad spectrum of tasks and environments².

(1) AgentGym: Evolving Large Language Model-based Agents across Diverse .... https://arxiv.org/html/2406.04151v1. (2) GitHub - WooooDyy/AgentGym: Code and implementations for the paper .... https://github.com/WooooDyy/AgentGym. (3) AgentGym: Evolving Large Language Model-based Agents across Diverse .... https://agentgym.github.io/. (4) undefined. https://agentgym.github.io.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages