1 code implementation • 9 Feb 2024 • Mingzhe Xing, Rongkai Zhang, Hui Xue, Qi Chen, Fan Yang, Zhen Xiao
These challenges motivate AndroidArena, an environment and benchmark designed to evaluate LLM agents on a modern operating system.
Date Understanding Language Modelling +1