no code implementations • 2 Oct 2024 • R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, Thomas L. Griffiths
In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction.
no code implementations • 25 Sep 2024 • Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, Qingfu Zhang
We propose the first LLM-based multi-objective heuristic search framework, Multi-objective Evolution of Heuristic (MEoH), which integrates LLMs in a zero-shot manner to generate a non-dominated set of heuristics to meet multiple design criteria.
1 code implementation • 22 Aug 2024 • Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, Zhiyu Li
This paper systematically reviews the latest advancements in CTG for LLMs, offering a comprehensive definition of its core concepts and clarifying the requirements for control conditions and text quality.
no code implementations • 15 Aug 2024 • Shunyu Yao, Mitchy Lee
Modeling such a large state space and storing the information of each state requires exceptional computational resources, which makes it challenging to find the shortest solution to a scrambled Rubix cube with limited resources.
1 code implementation • 17 Jun 2024 • Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan
Existing benchmarks do not test language agents on their interaction with human users or ability to follow domain-specific rules, both of which are vital for deploying them in real world applications.
1 code implementation • 6 Jun 2024 • Fangfu Liu, HanYang Wang, Shunyu Yao, Shengjun Zhang, Jie zhou, Yueqi Duan
In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors.
1 code implementation • 27 May 2024 • Zhi Zheng, Shunyu Yao, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Ke Tang
The min-max vehicle routing problem (min-max VRP) traverses all given customers by assigning several routes and aims to minimize the length of the longest route.
1 code implementation • 6 May 2024 • John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
We investigate how interface design affects the performance of language model agents.
Ranked #3 on
Bug fixing
on SWE-bench-lite
1 code implementation • 17 Apr 2024 • Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei LI, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Fangyuan Kong, Haotian Fan, Yifang Xu, Haoran Xu, Mengduo Yang, Jie zhou, Jiaze Li, Shijie Wen, Mai Xu, Da Li, Shunyu Yao, Jiazhi Du, WangMeng Zuo, Zhibo Li, Shuai He, Anlong Ming, Huiyuan Fu, Huadong Ma, Yong Wu, Fie Xue, Guozhi Zhao, Lina Du, Jie Guo, Yu Zhang, huimin zheng, JunHao Chen, Yue Liu, Dulan Zhou, Kele Xu, Qisheng Xu, Tao Sun, Zhixiang Ding, Yuhang Hu
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i. e., Kuaishou/Kwai Platform.
1 code implementation • 16 Apr 2024 • Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao
In this paper, we introduce the USACO benchmark with 307 problems from the USA Computing Olympiad, along with high-quality unit tests, reference code, and official analyses for each problem.
2 code implementations • 13 Mar 2024 • Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen
Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities.
1 code implementation • 12 Feb 2024 • Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong
Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents.
1 code implementation • 19 Oct 2023 • Fei Liu, Xi Lin, Zhenkun Wang, Shunyu Yao, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang
It is also promising to see the operator only learned from a few instances can have robust generalization performance on unseen problems with quite different patterns and settings.
4 code implementations • 10 Oct 2023 • Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models.
no code implementations • 9 Oct 2023 • Baian Chen, Chang Shu, Ehsan Shareghi, Nigel Collier, Karthik Narasimhan, Shunyu Yao
Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act.
Ranked #7 on
Question Answering
on Bamboogle
no code implementations • 24 Sep 2023 • R. Thomas McCoy, Shunyu Yao, Dan Friedman, Matthew Hardy, Thomas L. Griffiths
This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input.
2 code implementations • 5 Sep 2023 • Theodore R. Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L. Griffiths
Recent efforts have augmented large language models (LLMs) with external resources (e. g., the Internet) or internal control flows (e. g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents.
1 code implementation • 17 Jul 2023 • Shunyu Yao, Howard Chen, Austin W. Hanjie, Runzhe Yang, Karthik Narasimhan
Text generation under constraints have seen increasing interests in natural language processing, especially with the rapidly improving capabilities of large language models.
2 code implementations • NeurIPS 2023 • John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao
Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution, and is compatible out-of-the-box with traditional seq2seq coding methods, while enabling the development of new methods for interactive code generation.
1 code implementation • 24 May 2023 • Michael Tang, Shunyu Yao, John Yang, Karthik Narasimhan
We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i. e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-shot information retrieval.
5 code implementations • NeurIPS 2023 • Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference.
Ranked #1 on
Arithmetic Reasoning
on Game of 24
1 code implementation • 17 May 2023 • Mo Yu, Jiangnan Li, Shunyu Yao, Wenjie Pang, Xiaochen Zhou, Zhou Xiao, Fandong Meng, Jie zhou
As readers engage with a story, their understanding of a character evolves based on new events and information; and multiple fine-grained aspects of personalities can be perceived.
no code implementations • 19 Apr 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan
We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.
5 code implementations • NeurIPS 2023 • Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao
Large language models (LLMs) have been increasingly used to interact with external environments (e. g., games, compilers, APIs) as goal-driven agents.
no code implementations • CVPR 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan
We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.
no code implementations • 15 Oct 2022 • Yi Gu, Shunyu Yao, Chuang Gan, Joshua B. Tenenbaum, Mo Yu
Text games present opportunities for natural language understanding (NLU) methods to tackle reinforcement learning (RL) challenges.
7 code implementations • 6 Oct 2022 • Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e. g. chain-of-thought prompting) and acting (e. g. action plan generation) have primarily been studied as separate topics.
Ranked #20 on
Question Answering
on WebQuestions
3 code implementations • 4 Jul 2022 • Shunyu Yao, Howard Chen, John Yang, Karthik Narasimhan
Existing benchmarks for grounding language in interactive environments either lack real-world linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals.
1 code implementation • NAACL 2022 • Yisi Sang, Xiangyang Mou, Mo Yu, Shunyu Yao, Jing Li, Jeffrey Stanton
We propose a new task for assessing machines' skills of understanding fictional characters in narrative stories.
1 code implementation • ICLR 2022 • Shunyu Yao, Mo Yu, Yang Zhang, Karthik R Narasimhan, Joshua B. Tenenbaum, Chuang Gan
In this work, we propose a novel way to establish such a link by corpus transfer, i. e. pretraining on a corpus of emergent language for downstream natural language tasks, which is in contrast to prior work that directly transfers speaker and listener parameters.
1 code implementation • ICLR 2022 • Jens Tuyls, Shunyu Yao, Sham Kakade, Karthik Narasimhan
Text adventure games present unique challenges to reinforcement learning methods due to their combinatorially large action spaces and sparse rewards.
no code implementations • 3 Jan 2022 • Shunyu Yao, RuiZhe Zhong, Yichao Yan, Guangtao Zhai, Xiaokang Yang
Specifically, neural radiance field takes lip movements features and personalized attributes as two disentangled conditions, where lip movements are directly predicted from the audio inputs to achieve lip-synchronized generation.
1 code implementation • CVPR 2021 • Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, Li Cheng
Complex backgrounds and similar appearances between objects and their surroundings are generally recognized as challenging scenarios in Salient Object Detection (SOD).
Ranked #3 on
Object Detection
on PKU-DDD17-Car
1 code implementation • ACL 2021 • Shunyu Yao, Binghui Peng, Christos Papadimitriou, Karthik Narasimhan
Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as $\mathsf{Dyck}_k$, the language consisting of well-nested parentheses of $k$ types.
no code implementations • NAACL 2021 • Shunyu Yao, Karthik Narasimhan, Matthew Hausknecht
Text-based games simulate worlds and interact with players using natural language.
1 code implementation • ICCV 2021 • Miao Zhang, Jie Liu, Yifei Wang, Yongri Piao, Shunyu Yao, Wei Ji, Jingjing Li, Huchuan Lu, Zhongxuan Luo
Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner.
Ranked #15 on
Video Polyp Segmentation
on SUN-SEG-Easy (Unseen)
1 code implementation • EMNLP 2020 • Shunyu Yao, Rohan Rao, Matthew Hausknecht, Karthik Narasimhan
In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state.
1 code implementation • NeurIPS 2019 • Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman
We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology.
1 code implementation • NeurIPS 2018 • Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum
In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.