no code implementations • 28 Jan 2025 • Faria Huq, Zora Zhiruo Wang, Frank F. Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P. Bigham, Graham Neubig
CowPilot reduces the number of steps humans need to perform by allowing agents to propose next steps, while users are able to pause, reject, or take alternative actions.
1 code implementation • 18 Dec 2024 • Frank F. Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Z. Wang, Xuhui Zhou, Zhitong Guo, Murong Cao, Mingyang Yang, Hao Yang Lu, Amaad Martin, Zhe Su, Leander Maben, Raj Mehta, Wayne Chi, Lawrence Jang, Yiqing Xie, Shuyan Zhou, Graham Neubig
We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet.
1 code implementation • 21 Oct 2024 • Yueqi Song, Frank Xu, Shuyan Zhou, Graham Neubig
To do so, we propose two varieties of agents: (1) an API-calling agent that attempts to perform online tasks through APIs only, similar to traditional coding agents, and (2) a Hybrid Agent that can interact with online data through both web browsing and APIs.
1 code implementation • 11 Oct 2024 • Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, Summer Yue, Zifan Wang
We study an open question in this work: does the desired safety refusal, typically enforced in chat contexts, generalize to non-chat and agentic use cases?
no code implementations • 24 Sep 2024 • Tianyue Ou, Frank F. Xu, Aman Madaan, Jiarui Liu, Robert Lo, Abishek Sridhar, Sudipta Sengupta, Dan Roth, Graham Neubig, Shuyan Zhou
LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives (e. g., arranging an online meeting).
1 code implementation • 18 Jun 2024 • Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu
To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web interactions.
1 code implementation • 11 Apr 2024 • Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity.
1 code implementation • 24 Jan 2024 • Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried
Through extensive quantitative and qualitative analysis, we identify several limitations of text-only LLM agents, and reveal gaps in the capabilities of state-of-the-art multimodal language agents.
1 code implementation • 25 Jul 2023 • Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
3 code implementations • 23 May 2023 • Abishek Sridhar, Robert Lo, Frank F. Xu, Hao Zhu, Shuyan Zhou
Large language models (LLMs) struggle on processing complicated observations in interactive decision making tasks.
no code implementations • 1 May 2023 • Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins
Many recent advances in natural language generation have been fueled by training large language models on internet-scale data.
1 code implementation • 10 Feb 2023 • Shuyan Zhou, Uri Alon, Sumit Agarwal, Graham Neubig
We release five language-specific pretrained models to use with our publicly available code.
1 code implementation • 26 Jan 2023 • Li Zhang, Hainiu Xu, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora, Chris Callison-Burch
By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to . 67 F1.
1 code implementation • 20 Dec 2022 • Zhiruo Wang, Shuyan Zhou, Daniel Fried, Graham Neubig
To extend the scope of coding queries to more realistic settings, we propose ODEX, the first Open-Domain EXecution-based natural language (NL) to Python code generation dataset.
3 code implementations • 18 Nov 2022 • Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, PengFei Liu, Yiming Yang, Jamie Callan, Graham Neubig
Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.
Ranked #43 on
Math Word Problem Solving
on MATH
2 code implementations • 13 Oct 2022 • Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig
In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e. g., T5) and other strong LMs such as GPT-3 in the few-shot setting.
2 code implementations • 13 Jul 2022 • Shuyan Zhou, Uri Alon, Frank F. Xu, Zhiruo Wang, Zhengbao Jiang, Graham Neubig
Publicly available source-code libraries are continuously growing and changing.
1 code implementation • 16 Mar 2022 • Zhiruo Wang, Grace Cuenca, Shuyan Zhou, Frank F. Xu, Graham Neubig
While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric.
1 code implementation • ACL 2022 • Shuyan Zhou, Li Zhang, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch, Graham Neubig
To this end, we develop a simple and efficient method that links steps (e. g., "purchase a camera") in an article to other articles with similar goals (e. g., "how to choose a camera"), recursively constructing the KB.
no code implementations • NAACL (SUKI) 2022 • Shuyan Zhou, Pengcheng Yin, Graham Neubig
When humans conceive how to perform a particular task, they do so hierarchically: splitting higher-level tasks into smaller sub-tasks.
1 code implementation • ACL 2020 • Shruti Rijhwani, Shuyan Zhou, Graham Neubig, Jaime Carbonell
However, designing such features for low-resource languages is challenging, because exhaustive entity gazetteers do not exist in these languages.
1 code implementation • TACL 2020 • Shuyan Zhou, Shruti Rijhawani, John Wieting, Jaime Carbonell, Graham Neubig
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.
1 code implementation • WS 2019 • Shuyan Zhou, Shruti Rijhwani, Graham Neubig
Cross-lingual entity linking (XEL) grounds named entities in a source language to an English Knowledge Base (KB), such as Wikipedia.
1 code implementation • WS 2019 • Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig
While neural machine translation (NMT) achieves remarkable performance on clean, in-domain text, performance is known to degrade drastically when facing text which is full of typos, grammatical errors and other varieties of noise.
no code implementations • CONLL 2018 • Feng Nie, Shuyan Zhou, Jing Liu, Jinpeng Wang, Chin-Yew Lin, Rong pan
The task of entity linking aims to identify concepts mentioned in a text fragments and link them to a reference knowledge base.