DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Mind2Web	HTML-T5-XL	Element Accuracy	73	# 1
Mind2Web	HTML-T5-XL	Operation F1 score	75.6	# 1
Mind2Web	HTML-T5-XL	Step Success Rate	67.1	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-real-world-webagent-with-planning-long/on-mind2web)](https://paperswithcode.com/sota/on-mind2web?p=a-real-world-webagent-with-planning-long)`

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

24 Jul 2023 · Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust ·

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

PDF Abstract