Search Results for author: Shuyan Zhou

Found 19 papers, 15 papers with code

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

no code implementations11 Apr 2024 Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity.

Benchmarking

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

1 code implementation24 Jan 2024 Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

Through extensive quantitative and qualitative analysis, we identify several limitations of text-only LLM agents, and reveal gaps in the capabilities of state-of-the-art multimodal language agents.

WebArena: A Realistic Web Environment for Building Autonomous Agents

1 code implementation25 Jul 2023 Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig

Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.

Hierarchical Prompting Assists Large Language Model on Web Navigation

3 code implementations23 May 2023 Abishek Sridhar, Robert Lo, Frank F. Xu, Hao Zhu, Shuyan Zhou

Large language models (LLMs) struggle on processing complicated observations in interactive decision making tasks.

Decision Making Language Modelling +1

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

1 code implementation10 Feb 2023 Shuyan Zhou, Uri Alon, Sumit Agarwal, Graham Neubig

We release five language-specific pretrained models to use with our publicly available code.

Code Generation

Causal Reasoning of Entities and Events in Procedural Texts

1 code implementation26 Jan 2023 Li Zhang, Hainiu Xu, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora, Chris Callison-Burch

By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to . 67 F1.

Execution-Based Evaluation for Open-Domain Code Generation

1 code implementation20 Dec 2022 Zhiruo Wang, Shuyan Zhou, Daniel Fried, Graham Neubig

To extend the scope of coding queries to more realistic settings, we propose ODEX, the first Open-Domain EXecution-based natural language (NL) to Python code generation dataset.

Code Generation Memorization

PAL: Program-aided Language Models

2 code implementations18 Nov 2022 Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, PengFei Liu, Yiming Yang, Jamie Callan, Graham Neubig

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

Arithmetic Reasoning GSM8K +2

Language Models of Code are Few-Shot Commonsense Learners

1 code implementation13 Oct 2022 Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig

In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e. g., T5) and other strong LMs such as GPT-3 in the few-shot setting.

Code Generation

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

1 code implementation16 Mar 2022 Zhiruo Wang, Grace Cuenca, Shuyan Zhou, Frank F. Xu, Graham Neubig

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric.

Code Generation Code Summarization

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

1 code implementation ACL 2022 Shuyan Zhou, Li Zhang, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch, Graham Neubig

To this end, we develop a simple and efficient method that links steps (e. g., "purchase a camera") in an article to other articles with similar goals (e. g., "how to choose a camera"), recursively constructing the KB.

Retrieval Video Retrieval

Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language

no code implementations NAACL (SUKI) 2022 Shuyan Zhou, Pengcheng Yin, Graham Neubig

When humans conceive how to perform a particular task, they do so hierarchically: splitting higher-level tasks into smaller sub-tasks.

Instruction Following

Soft Gazetteers for Low-Resource Named Entity Recognition

1 code implementation ACL 2020 Shruti Rijhwani, Shuyan Zhou, Graham Neubig, Jaime Carbonell

However, designing such features for low-resource languages is challenging, because exhaustive entity gazetteers do not exist in these languages.

Cross-Lingual Entity Linking Entity Linking +4

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

1 code implementation TACL 2020 Shuyan Zhou, Shruti Rijhawani, John Wieting, Jaime Carbonell, Graham Neubig

Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.

Cross-Lingual Entity Linking Entity Linking +1

Towards Zero-resource Cross-lingual Entity Linking

1 code implementation WS 2019 Shuyan Zhou, Shruti Rijhwani, Graham Neubig

Cross-lingual entity linking (XEL) grounds named entities in a source language to an English Knowledge Base (KB), such as Wikipedia.

Cross-Lingual Entity Linking Entity Linking

Improving Robustness of Neural Machine Translation with Multi-task Learning

1 code implementation WS 2019 Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig

While neural machine translation (NMT) achieves remarkable performance on clean, in-domain text, performance is known to degrade drastically when facing text which is full of typos, grammatical errors and other varieties of noise.

Machine Translation Multi-Task Learning +2

Aggregated Semantic Matching for Short Text Entity Linking

no code implementations CONLL 2018 Feng Nie, Shuyan Zhou, Jing Liu, Jinpeng Wang, Chin-Yew Lin, Rong pan

The task of entity linking aims to identify concepts mentioned in a text fragments and link them to a reference knowledge base.

Card Games Entity Linking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.