Mobile-Env: An Evaluation Platform and Benchmark for LLM-GUI Interaction

14 May 2023 · Danyang Zhang, Hongshen Xu, Zihan Zhao, Lu Chen, Ruisheng Cao, Kai Yu ·

The User Interface (UI) is pivotal for human interaction with the digital world, facilitating efficient control of machines, information navigation, and complex task completion. To achieve easy, efficient, and free interactions, researchers have been exploring the potential of encapsulating the traditional Programming Language Interfaces (PLIs) and Graphical User Interfaces (GUIs) into Natural Language Interfaces (NLIs). However, due to the limited capabilities of small models, traditional work mainly focuses on tasks for which only a single step is needed. This largely constrains the application of NLIs. Recently, Large Language Models (LLMs) have exhibited robust reasoning and planning abilities, yet their potential for multi-turn interactions in complex environments remains under-explored. To assess LLMs as NLIs in real-world graphical environments, we introduce the GUI interaction platform, Mobile-Env, specifically on mobile apps. Mobile-Env enhances interaction flexibility, task extensibility, and environment adaptability compared with previous environments. A GUI task set based on WikiHow app is collected on Mobile-Env to form a benchmark covering a range of GUI interaction capabilities. We further conduct comprehensive evaluations of LLM agents, including various versions of GPT, LLaMA 2, and AgentLM, on WikiHow task set to acquire insights into the potentials and challenges of LLMs in GUI interactions.

PDF Abstract